Many Exchange Server administrators' worst nightmare is a database failure that deprives users of email. Even more dire would be a subsequent tape failure that would render the backup invalid, but for most administrators, imagining the loss of email service is bad enough. The first step in getting users back online is to restore the database, but today's mailbox stores are often very large and take a long time to restore—and you can begin the restore operation only after you fix the problem that caused the failure, which could be hardware-related and thus require spare parts. All in all, a database failure is enough to ruin anyone's day.
Because email is now a mission-critical application for many organizations, disaster recovery is an impor- tant consideration when planning Exchange deployments. One approach is to install a recovery server in every important location and use this server for restoring databases and recovering mailbox data while the original server is repaired. This approach is effective, but it's costly because you need to deploy and maintain additional hardware or be able to source the necessary hardware quickly if a failure occurs. You also have to implement a parallel Active Directory (AD) environment to be able to restore mailbox stores to the recovery server.
The ability to perform all the necessary operations on one server would be nice, and that's just what the Recovery Storage Group (RSG) feature of Exchange Server 2003 lets you do. Think of the RSG as a specially tailored version of a regular storage group (SG) to which you can restore a backup copy of a mailbox store. You can then use Exchange System Manager (ESM) to work with the data in the mailboxes that are in the database that you load into the RSG. To get you started using the RSG, I explain its essential details.
Two fundamental advances helped the RSG become reality. The first is hardware capability—the majority of servers in use today can easily handle the demands of an online restore and recovery operation while still providing good service to users. The second is the change in the Store architecture in Exchange 2000 Server that lets Exchange support more than one mailbox store on a server. The new architecture allowed for up to 20 SGs on a server, each supporting up to five databases. However, an excessive amount of memory is required to support such a large number of databases, so Microsoft settled on a maximum of four SGs. Exchange 2000 uses a special fifth SG for restore operations, but we had to wait until Exchange 2003 for the full implementation of the RSG. Note that although Exchange 2003, Standard Edition supports only one SG, it can still use the RSG.
Although the RSG is similar to a typical SG, Microsoft specifically designed it for temporary use by recovery operations, and you can't use the RSG in the same way as other SGs. Many features (such as support for access through Internet protocols) are disabled to reduce the overhead of running the RSG and to avoid interference with the operational environment. For example, you can't create new mailboxes in a mailbox store in the RSG, nor can users log on to send mail or otherwise access mail in mailboxes in the RSG by using Outlook or other email clients. Exchange doesn't apply system policies to a mailbox store in the RSG, nor does it include the mailbox store in daily background maintenance operations (such as online defragmentation). Mailboxes are present in the RSG, but they're just containers for messages and attachments that you want to recover and move into a regular mailbox store that's active on the server. In fact, the only access is through the Exchange 2003 version of ESM or a utility program such as Exchange Server Mailbox Merge Wizard (ExMerge), which you can download from http://www .microsoft.com/exchange/downloads/2003/default.mspx.
You can use the RSG to recover data for specific users, but the most common use is to recover mailboxes from an entire store that a failure has rendered inoperative. For the RSG to recover data, it must be able to link a mailbox to a valid user and that mailbox must belong to a database on a server that's in the same administrative group as the RSG server. The RSG uses two important AD properties in its operations:
Other means are available to recover mailboxes for specific users. If you delete a mailbox in error, you can recover it (provided you realize the mistake and recover the mailbox before the mailbox-retention period expires—the default period is 30 days) with ESM's Mailbox Recovery Center. If users have deleted some items that they need, they can usually recover the items themselves by using the Recover Deleted Items feature of Outlook or Outlook Web Access (OWA). If users use another client, such as Outlook Express, an administrator can easily recover the items by using OWA. Of course, recovery depends on users realizing that they want an item before its retention period expires, but most companies have a retention period of at least 7 days, and if a user doesn't realize that an item is important within a week, then maybe it really isn't so important.
Although the RSG can handle many recovery situations, it can't recover a server from a catastrophic failure because it depends on an operating Exchange server. For example, if your server requires a total rebuild after a catastrophic crash, you can't use the RSG (unless another server is present in the administrative group). Instead, you'll have to rebuild the server, install the OS, install Exchange, then perform a typical recovery using the last good backup. Another situation that RSG can't help with is the recovery of public-folder data. If you've lost some data from a public folder, you must recover the data from backup media.
Email Dial-Tone Service
People now consider email a daily necessity rather than a luxury. If users lose service, they want it restored as quickly as possible, the same way they'd want their telephone, electricity, or water service restored. Enabling fast recovery is the purpose of the RSG. It lets Exchange restore service quickly following an outage so that users can send and receive email far faster than if you had to restore a corrupted database on a separate server—even if they have to wait for the full recovery of their mailbox contents. Microsoft refers to the RSG's function as enabling dial-tone recovery—that is, enabling fast recovery of service with a temporary reduction in functionality until the full recovery operation is finished. The basic steps to perform an Exchange dial-tone recovery with the RSG are as follows:
1.Assess, then fix any underlying hardware problem that might have contributed to the database failure. Attempting to recover data when a hardware problem still lurks is pointless.
2.If the database is still mounted, dismount it and copy the failed database files to a safe location (you might want access to them for diagnostic purposes), then delete the files from their original location. In most cases, the Store won't be able to mount the databases because of whatever problem caused the failure, so you must copy the files. Make sure that you copy the transaction logs as well.
3.Attempt to mount the mailbox store. ESM will detect that the files aren't present and will offer to create a new database, as Figure 1, page 6, shows. Accept the offer, and Exchange will proceed to create a new (and empty) mailbox store, streaming file, and transaction logs in the production location.
No mailboxes are present in the new dial-tone database, but Exchange will create them as soon as they're required (e.g., when a user logs on to connect, when a message arrives for delivery to a mailbox). Some administrators recommend that you send a message to all the mailboxes in the affected database to force Exchange to create the mailboxes. If you don't have a mailbox list, you can generate one with ESM's Export List function (on the Action menu), but only after you've recovered the database into the RSG. Another thing to bear in mind is that any existing rules, custom folder views, personal forms, OWA settings, and junk-mail settings are no longer available because they're part of the original mailbox's properties. You need to set user expectations here—otherwise you might have unhappy users who don't understand why their mailboxes function differently. Users will get their settings back if you swap the recovered database back into production—more about this later.
Users can now log on to send and receive email messages. They don't have access to any messages in the failed database, but they have email dial-tone service. Anyone who uses Microsoft Office Outlook 2003 in cached Exchange mode or an earlier version of Outlook with an offline folder store (OST) must work online or recreate the OST because the Messaging API (MAPI) token in the old client OST doesn't match the globally unique identifier (GUID) of the user's mailbox in the new database. Given the size of many OSTs today, the prospect of recreating one isn't attractive, so later I describe a tactic for minimizing the problem.
4.After Exchange creates the new mailbox store and users can send and receive messages, you can create the RSG by right-clicking the name of the server that you want to host the RSG and selecting New, Recovery Storage Group, as Figure 2 shows. You should place the directories for the RSG on volumes that have sufficient space to hold the recovered databases and logs from the backup set. For the best performance, put the recovered files on a volume different from the one your old database is on. However, if you plan to swap the recovered database back into production (to recover user settings), doing so is easier if the databases are on the same volume. Specify the same filename for the new database as you used for the old database—this practice will prevent problems if you elect to move the recovered copy back into production.
5.The next step is to add an entry for the failed database into the RSG by selecting it from a list of stores. As Figure 3 shows, Exchange reports the software version that the server runs (in this case, the version is 7226, meaning Exchange 2003 Service Pack 1—SP1) and lists the mailbox stores that you can recover to the target server. You can recover a mailbox store only from a server in the same administrative group as the recovery server. The treeview at the left side of Figure 3 shows three servers, each of which supports one mailbox store. These are the mailbox stores that ESM can recover, as the dialog box at the right side of Figure 3 shows.
6.To recover the physical copy of the database, you can either copy the files to the location that you specified for the RSG or restore the last good copy of the failed database to the RSG. The copy option is available only if you have an offline full backup—otherwise, some data will likely be missing. Before beginning the restore operation, make sure that the This database can be overwritten by a restore option in ESM is selected. To avoid file conflicts, remove all traces of any database that's currently mounted in the RSG, including deleting the database files and transaction logs belonging to that database. If you don't need to restore any incremental backups to recover transaction logs generated since the last good backup was taken, you can select the Last Restore Set option so that the Store can roll forward the transactions from the logs into the database after the restore is finished. The RSG has its own set of transaction logs that have an R00 prefix, so the Store first connects to the recovered transaction logs, replays any outstanding transactions, then connects to the RSG logs for further transactions.
After you've successfully recovered a database into the RSG, you can do one of two things: Move the newly recovered database back to the production location and resume operations, or merge data from the recovered database in the RSG back into the mailboxes in the dial-tone database that's now in the production location. The first option is viable if users haven't connected to the dial-tone database to send and receive email. If users have connected to the dial-tone database, leave them alone and recover and merge data in the background. This approach has an impact on OSTs and the contents of mailboxes, so you might want to wait until the system is quiet, then switch the recovered database over into the production location and move the dial-tone database into the RSG. The logic here is that users want their original mailboxes restored as quickly as possible and the dial-tone database invariably has less data to process.
Solving the OST Problem
If you follow standard RSG operating procedures and create a new dial-tone database to recommence operations, users won't be able to synchronize their existing OST with their new mailbox in the dial-tone database because the MAPI ID stored in the OST won't match the MAPI ID of the new mailbox. Users can continue to work online to send and receive messages, but they can't work in cached Exchange mode and they won't have access to the messages in and the settings of their old mailbox. Alternatively, they can delete their old OST and create a new one that they can synchronize with their new dial-tone database mailbox and that will let them work in cached Exchange mode, but they'll still lose access to all their old messages as well as the rules and permissions that existed in the old mailbox.
Recreating an OST from scratch isn't a big deal if your users have small mailboxes, but it becomes a huge problem if they have large (more than 200MB) mailboxes. Thus, it's often better to switch the dial-tone database out of production and replace it with the recovered database as quickly as you can because then the MAPI ID matches and normal operations can resume. Of course, you still have to recover and merge in the messages that users created and sent while the dial-tone database was active, but these messages typically represent far less data to process and cause much less disruption than the alternative.
The big problem with switching databases around is that you must use command-line commands to copy the files—no UI, wizard, or other automated procedure is available to help. To swap the databases you need to:
1. Take a full online backup of the dial-tone database.
2.Stop the Information Store service. This halts access to all stores on the server, but it's the safest way to ensure that the databases have been correctly closed and that all files are available for copying.
3.Check the dial-tone database and recovered database to ensure that they shut down cleanly. The easiest way to do so is to run the Eseutil utility with the /mh (check database header) switch and scan the output to validate that it reports a clean shutdown.
4.Copy the dial-tone database and its transaction logs to a temporary location.
5.Copy the recovered database and its transaction logs from the RSG location to the production location.
6.Copy the dial-tone database and its transaction logs from the temporary location to the RSG location.
7.Use ESM to select the This database can be overwritten by a restore option for both the dial-tone database and recovered database so that the Store will mount the databases successfully when it restarts.
8.Restart the Information Store service. Users should now be able to access all the data in their mailboxes as of the last full backup plus any incremental backups that you recovered. Users can't yet access any messages that they sent or received while the dial-tone database was in place.
9.Extract message data from the dial-tone database and merge it back into the mailboxes in the production database. You can use ESM to view mailboxes in the dial-tone database in the RSG; select the mailboxes that you want to recover, as Figure 4 shows; then use ExMerge or the Exchange Task Wizard in Exchange 2003 SP1 to merge the dial-tone mailbox contents into the live-database mailboxes to provide the users with full copies of their mailboxes. For more information about using Exchange 2003 SP1's Exchange Task Wizard to merge the mailbox contents, see the Web-exclusive sidebar "Exchange Task Wizard Merges Mailboxes"(http://www.windowsitpro.com/microsoftexchangeoutlook,InstantDoc ID 44499).
The RSG is an important advance in Exchange disaster-recovery technology. The one thing you can be certain of is that some sort of disaster will happen at some time during a deployment. Although storage technology has improved greatly in its reliability over the past few years, a storage failure that affects a mailbox store has a huge impact on users, so anything that helps you restore service promptly is welcome. The RSG meets that need effectively. However, as with all technology, the RSG has its own set of kinks and habits, and you have to understand what's going on behind the scenes to use the RSG effectively. For more information about things to keep in mind when using the RSG, see the Web-exclusive sidebar "RSG Considerations" (http://www.windowsit pro.com/microsoftexchangeoutlook, InstantDoc ID 44500). Getting some practice in using the RSG on a test server—especially retrieving data and putting it back into user mailboxes—is a good idea.