\[Editor's Note: This column was largely adapted from Chapter 17 of Managing Microsoft Exchange Server, ISBN 1565925459. The information appears here courtesy of the publisher, O'Reilly & Associates.\]
Because this issue of Windows 2000 Magazine highlights disaster recovery, I thought I'd give you a quick rundown of the basic planning procedures to follow before disaster (i.e., a crashed Microsoft Exchange Server system) strikes, as well as the first steps to take after the fall.
Planning Prevents Poor Performance
Failure to plan is the primary cause of permanent Exchange Server data loss. Consider a typical single-site environment with two servers. A fire destroys one server. The server's backup tapes were sitting next to it instead of in an offsite (or fireproof) vault. That server is irrecoverable because its administrator didn't plan adequately, not just because it roasted like a marshmallow. Ask yourself the following questions, then take the appropriate steps based on your answers.
How long can I afford for my server to be down? The less downtime you can tolerate, the more preparation you need. For example, if you can't afford for your mail server to be down for more than 4 hours, you need to think about ways to reduce your recovery time (e.g., using hot spares, clustering) or change your backup strategy to permit faster restores.
Do I have adequate replacement hardware? If your primary server is a large quad-processor box with a 60GB store, what happens when you need to restore all 60GB of data to another machine? The best solution is to keep a clone of your standard-server configuration so that you can use it as a recovery server, but you might need to find creative workarounds if you don't have any spares. Of course, you also need to ensure that you have the right backup hardware and software on your recovery server, or you might not be able to restore your backup.
Do I make regular backups and make them often enough to capture all the changes that occur on my server? Do the backups include system information such as the domain SAM and server Registry? For information about making proper backups, see Getting Started with Exchange, "The Six Deadly Backup Sins," April 2000.
Do I make the right kind of backups? Consider whether your backups adequately capture the data you need. If the answer is no, come up with a new plan.
Do I regularly test my backups to make sure that they work properly? Do I regularly review the backup logs for errors? Do I practice disaster recovery to prevent surprises? The answer to all three of these questions had better be yes, or you're headed for trouble. Change your ways now, while you still can.
Are my backup tapes secure? Ideally, keep multiple backup sets and store some of them in a secure offsite location. And be sure you have at least one spare tape drive that can restore the tapes.
Before you can build a truly comprehensive plan, you must understand the mechanics of Exchange disaster recovery. I suggest that you read Microsoft's white paper, "Microsoft Exchange Disaster Recovery" (http://www.microsoft.com/exchange/ 55/whpprs/backuprestore.htm), as soon as you can. In the meantime, the following basics will give you the knowledge you need to get started on an appropriate plan.
What to Restore?
You can cleanly separate recovery operations into two tasks: recovering the OS and recovering the Exchange Server data. Sometimes you need to perform both tasks, and sometimes you need to perform only one task.
The first task is recovering the OS. If the Exchange Server database isn't damaged and if you can restore the OS without touching the Exchange Server installation, you're good to go with a simple OS restoration. You're more likely to be in this favorable situation if you use a disk configuration that separates your OS installation, Exchange Server databases, and transaction logs on separate physical disks and if you use recovery methods (e.g., a parallel Windows NT installation) that help you quickly recover NT. I recommend keeping the OS, transaction logs, and Exchange databases on separate physical-disk subsystems whenever possible.
Let me digress to point out a disaster-recovery obstacle that affects many sites: using a PDC as your Exchange server. To restore an Exchange server from NT, you must have access to the SAM for the server's domain. If your Exchange server is a member server or BDC, you can probably access the SAM without difficulty (if a domain controller is available when you do the restore). But if your Exchange server is a PDC, beware. If the PDC fails and you need to reinstall NT to fix it, you'll have difficulty recovering your Exchange Server configuration because when you reinstall NT you get a new SAM database, and the old SIDs that Exchange Server needs disappear.
The second task involves recovering the Exchange Server database and log files. To successfully restore an Exchange Server database, you must follow a couple of ironclad rules:
You must have a complete backup—either a full backup or a full backup combined with appropriate incremental and differential backups.
You must have already disabled circular logging, and you must have access to the log files, either on their original disk or from a recent backup. (Circular logging overwrites old log files. You can restore a server on which you've enabled circular logging, but unless you disable this feature, you won't have a complete set of log files available at restore time.)
The first rule is self-evident: Without a good backup, you're toast. The second rule makes sense if you think about the transaction log files' purpose: to capture transactions that the Exchange store process hasn't committed to the store. If you have a store file backup and a complete set of log files, you can play back the log file transactions to restore the recovered database to the prior status quo. Guess what? If you don't have the log files, you lose any uncommitted transactions.
Restoring the OS
Imagine that you're trying to fix a downed server. Your Exchange Server data is still safe (as far as you can tell), and you still have access to the domain SAM, but you need to reinstall NT to get the server back up. Here's what to do.
- Remove the failed server's domain account from the PDC, then add it back.
- Reinstall NT on the failed server. Join the domain, and use the original machine name.
- Log on to the target machine as a domain administrator.
- Run Exchange Server Setup by using the /r switch, which installs Exchange Server without starting the Exchange Server services and without touching the original Exchange Server databases. Use the update /r switch to reinstall any Exchange Server service packs on the failed server.
- When Setup prompts you for a server name, be sure the new server name matches the original server name. (The names should match if the NT names are the same; Setup fills in the server name as the default choice.)
- Create a new Exchange Server site. Be sure to use exactly the same site and organization name that you used for the original server. Exchange Server distinguishes between upper- and lowercase, so make sure the new names' capitalization matches that of the original names.
- When Setup prompts you for a service account, use the same service account that you used for the original server.
- Install the same connectors that were on the original server.
- Install the same Exchange Server service pack that was on the original server.
- Configure the Internet Mail Service (IMS), Internet News Server (INS), Microsoft Mail (MS Mail) connector, and any third-party connectors, because they might store their configuration parameters in the Registry.
- Run the Exchange Performance Optimizer.
You now have a clean installation of NT and Exchange Server. However, don't start the Exchange services yet. If your Exchange Server database and log files are intact, you're in good shape. If not, you'll still need to reload the Exchange Server store data from your backups.
- If you have copies of transaction logs that you generated after the original backup, copy them to the recovery server's log directories.
- If you have an online backup, restore it using whatever backup tool you used to make it. Tell the backup software to restore the private and public Information Store (IS) databases. If you're using Ntbackup, select the Start Services After Restore check box. If you don't have transaction logs from after the original backup, select the Erase All Existing Data check box.
If you have an offline backup, make sure the Exchange Server services on the recovery server are still stopped. Next, copy the database and log files to their proper locations, restart the Directory Service (DS) and System Attendant service, then run isinteg patch. (For information about running Isinteg, see Getting Started with Exchange, "The Sorcerer's Apprentices," May 2000.) After Isinteg runs, restart the IS.
- Open a mailbox's Properties sheet and check the Primary Windows NT Account field to verify mailbox account associations. (If you used the correct domain SAM, the account will be correct. If the account is incorrect, your directory data didn't restore properly. At this point, reread the Microsoft white paper, or run—don't walk—to call Microsoft Product Support Services—PSS.)
- Use a client (i.e., Microsoft Outlook Web Access—OWA, Outlook, or Exchange Server) to verify that you can log on as a user, see calendar data, and exchange mail with other users.
- Repeat step 4 on another workstation, just to be sure that Exchange Server is functioning properly.
Restoring the Database and Log Files
How hard you need to work to restore an Exchange Server backup depends on how you stored your database and log files (i.e., whether your databases and transaction logs are on separate physical disks and, if so, whether one or both disks failed). Repairing a failed database disk is probably the easiest type of recovery. All you need do is perform the following steps:
- Use the Control Panel Services applet to disable and stop the System Attendant service. (Disabling the service keeps it from accidentally restarting before you complete the restore.)
- Replace the failed disk and keep the original drive letter. This step is important—if the drive letter changes, Exchange Server can't use the logs.
- Create an Exchange Server directory structure identical to the structure on the failed disk. (You can cheat and look in the Registry to get the correct structure.) Typically, you need to create the exchsrvr directory with subdirectories mdbdata and dsadata.
- Restart the System Attendant, then restore the databases from your most recent backup. If possible, use an online backup. Don't worry about restoring the transaction logs.
- Restart the DS, System Attendant, and IS services. If you used an offline backup, don't forget to run isinteg patch. When the IS starts, it will replay the transaction logs and bring the restored DS or IS up-to-date.
- Check the event log to make sure everything went smoothly. Exchange Server will record an event for each log file that Exchange Server processed successfully. Make sure all the log files are listed.
If your transaction log disk failed, you'll probably end up losing some data. When you lose the disk that holds the IS logs, perform the following steps:
- Use the Control Panel Services applet to disable and stop the System Attendant service.
- Replace the failed disk, then create a new logical disk with the same name as the original disk.
- Format the new disk and create an Exchange Server directory structure identical to the structure on the failed disk. In particular, you need to create the exchsrvr directory with subdirectory mdbdata.
- If you're doing an online restore, enable and restart the System Attendant service.
- Back up the IS databases, either online or offline.
- Restore the most recent online IS database backup.
- If you haven't already done so, enable the System Attendant service, then start the System Attendant. After the System Attendant is running, restart the DS and IS services. When the IS starts, it will contain the data from only the most recent backup.
- Check the event log to make sure everything went smoothly. Exchange Server will record an event for each log file that it successfully processed.
What if the logs and databases resided on the same physical disk or if both disks failed? Just follow both sets of steps, in order.
Time to Get Busy
I can't summarize in this short column everything about Exchange Server disaster recovery, but I've introduced you to the basics. I encourage you to study this article and read (or reread) Microsoft's disaster-recovery white paper, then write a detailed custom recovery plan and regularly practice recovery. Also, a $200 call to PSS for a helpful walk through the recovery process might be the cheapest insurance you ever buy. (For tips to get the most out of a call to PSS, see Getting Started with Exchange, "7 Steps to Using Tech Support," June 2000.) The help you can get now will be less expensive than the help you'll need after a bungled restoration.
- The URL referring to the Microsoft White Paper is incorrect. The URL should be http://www.microsoft.com/Exchange/techinfo/backuprestore.htm. We apologize for any inconvenience this change might have caused.