When you understand Microsoft Exchange 2000 Server's database technology and storage procedures, you can determine the best ways to recover your system after a disaster and increase your system's reliability. Study the recovery scenarios for your Exchange 2000 system and learn how the Exchange 2000 database engine backs up and restores data. Then, you can plan best practices and choose technologies to improve your Exchange 2000 deployments.
Exchange 2000 Recovery Scenarios
Disaster-recovery planning, testing, and operations using Exchange 2000 cover two scenarios: server catastrophe and Information Store (IS) recovery. If a disaster destroys your Exchange server, you need to recover server capabilities from scratch. You must pay attention to the OS (Windows 2000 or Windows NT), Microsoft IIS, Exchange 2000, and the IS. Exchange 2000 closely integrates with two Win2K components—Active Directory (AD) and the IIS metabase. Responsibility for most AD disaster-recovery planning probably won't lie with Exchange 2000 administrators. However, because Win2K's AD is similar to Exchange 2000's Extensible Storage Engine (ESE), some organizations might call on Exchange 2000 administrators to help infrastructure staff members (who typically have AD responsibility) plan for AD backup, recovery, and maintenance. You need to understand how to perform AD recovery operations on an Exchange 2000 server. Regardless of whether you rely on the infrastructure staff to handle recovery, you must factor AD into your Exchange 2000 disaster-recovery plans. In most cases, however, AD recovery will affect Exchange 2000 server recovery only if the Exchange 2000 server acts as a Win2K domain controller or Global Catalog (GC) server. (I don't recommend placing the Exchange 2000 server in those roles.)
The IIS metabase, which stores static configuration information for protocols and virtual servers, is an important factor in Exchange 2000 disaster-recovery planning. Exchange 2000 stores information about the protocols it uses, as well as about SMTP or other Exchange 2000 virtual servers, within IIS. Exchange Server administrators didn't need to plan for IIS metabase disaster recovery in earlier versions of Exchange Server; however, IIS is an integral part of Exchange 2000.
IS recovery is also a vital part of Exchange 2000 server restoration. Earlier versions of Exchange Server included the public (pub.edb) and private (priv.edb) stores. But because Exchange 2000 provides for multiple storage groups (SGs) and lets you configure multiple databases per SG, the program complicates disaster-recovery planning. For information about Exchange 2000's storage procedures, see "Exchange 2000 Storage Exposed, Part 1," July 2000. You need to be able to recover all SGs and databases, one particular SG, a message database (MDB), a mailbox, or even a message or document. Table 1 highlights the considerations for each Exchange 2000 recovery scenario.
In Exchange 2000, the disaster-recovery API that Microsoft provides as part of Exchange Server gets a face-lift. The ESE in Exchange 2000 makes an online backup API available through the eseclib2.dll DLL. This API lets Exchange Server service users while backup and restore operations occur. Earlier versions of Exchange Server have only one ESE instance; therefore, the IS services go offline during restore operations, and the server goes down until the recovery completes. Because Exchange 2000 has multiple ESE instances (i.e., SGs), one SG can recover while other SGs are online servicing users. Microsoft adapted the API to allow concurrent operations and to manage multiple SGs, MDBs, and log-file sets. Exchange 2000 also backs up the Key Management Server (KMS) and the Site Replication Service.
The ESE recovery API in Exchange 2000 allows more granularity than the API in earlier versions of Exchange Server allowed. You can back up or restore an entire SG or one MDB. In addition, a database in earlier versions of Exchange Server consisted of one file (i.e., the .edb file), but a database in Exchange 2000 is a set that includes an .edb file and a streaming (.stm) file.
Exchange 2000's backup-and-restore technology is similar to the technology in earlier Exchange Server versions, with a notable exception: SGs and multiple MDBs substantially affect the way the API works. In a forthcoming service pack, Exchange 2000 will support snapshot backups. Table 2 compares Exchange 2000's backup options.
The backup or combination of backups you select affects operational procedures, training, tape management, and restore times. Table 3 shows basic backup strategies for Exchange 2000. The table doesn't include a snapshot strategy because Microsoft is still deciding how to support the snapshot option in Exchange 2000. The Copy backup, which Table 3 also omits, is an archival or point-in-time copy option, not a disaster-recovery operation.
Backup strategies present trade-offs in restore operations. If you select a daily normal backup (Strategy 1), you deal with one tape and typically a fixed time and volume of data. You can easily plan your recovery windows. If you combine a normal backup on the first day of your backup cycle with daily incremental backups thereafter (Strategy 2), advantages on the first day are similar to those in Strategy 1. Incremental backups on subsequent days happen much faster because Exchange 2000 backs up only each day's log files. However, recovery is more difficult to manage because it requires multiple tapes and takes more time. Strategy 2 also increases the possibility of operator error or media failure. Strategy 3 is the middle ground between Strategies 1 and 2. With Strategy 3, you perform a normal backup on the first day and differential backups on subsequent days. This procedure requires two tapes for recovery: Tape 1 contains the normal backup from the first day (i.e., database, logs, and patch files), and Tape 2 contains all the log files that the program created since the first day. Because Strategy 3 requires only two tapes, it reduces backup time, data volume, errors, and the possibility of media failure.
The backup strategy you select will largely determine the amount of time you need to recover data. Many organizations select Strategy 1 as a best practice. As Exchange 2000's mission-critical nature grows, your ability to limit the number of tapes, reduce errors and media failures, and simplify procedures becomes increasingly important.
Exchange 2000's Backup Process
Figure 1 illustrates how Exchange 2000's ESE uses its recovery API to perform backups. An agent calls the ESE API and invokes the backup. That agent can be the Win2K Backup utility or a third-party backup package (e.g., Computer Associates'—CA's—ARCserveIT, VERITAS Software's Backup Exec, Legato Systems' NetWorker). The agent specifies the type of backup the ESE will perform. In incremental and differential backups, the ESE processes only log files. Normal (full) backups are more complex. After the agent calls the API and invokes a full backup, the store process (store.exe) tells the ESE (part of the store process) that a backup will start. Then, the ESE creates a patch file for each MDB and truncates transaction logging at the point of the current log-file generation. To truncate the logging, the ESE closes e0n.log as the last generation and creates a new e0n.log, in which n designates the SG number that the log belongs to. This process typically occurs for an entire SG. An SG contains as many as five MDBs that share a set of log files.
The backup agent then asks the ESE to back up the database files. The ESE backs up files by sending 4KB database pages in batches of 16, for a total of 64KB. As the ESE reads pages from the database, the ESE validates the page number and checksum data for each page. The property store (.edb) contains checksums and header information for pages in the streaming database. If the ESE finds an error, the operation stops so that the ESE doesn't back up a corrupted database. This condition generates a -1018 error warning.
After the ESE successfully copies all database pages for the .edb and .stm files to tape, the backup agent requests patch files and log files and writes them to tape. Patch files track page splits that occur in the database during backup. When page splits occur, the ESE writes the split pages to the patch files for their respective .edb files. The log files that the backup agent requests are those that occur in the generational sequence before the truncation point. After the backup agent backs up the log files, it deletes the log files up to the truncation point. When this process is completed, the backup file set (which includes database files, log files, and patch files) closes, the backup process finishes, and the API returns to the agent. The backup is complete, and the tape or other backup medium now contains a complete backup file set for the Exchange 2000 data. If you want to restore from a backup set, the database files (i.e., the .stm and .edb files), the transaction logs (up to the truncation point), and the patch files must be part of the backup set. If the backup is incremental or differential, the backup process includes only log files, and the steps involving the .edb, .stm, and patch files don't apply.
Exchange 2000's Restore Process
Exchange 2000's restore process is more complex than the backup process because during the restore process, the ESE needs to manage concurrent operations. In some cases, current database and log files exist. Figure 2 illustrates Exchange 2000's restore process.
An agent calls the recovery API to start the procedure. The store process tells the ESE that restore operations are starting for a particular SG. For the recovery procedure to begin, the administrator or the backup-and-restore software must dismount the affected MDBs. The ESE launches an ESE recovery instance (i.e., an SG) that exists only while the recovery operation proceeds. (The recovery instance writes a file called restore.env to the default database directory. Restore.env tracks current restore-in-progress information such as database paths. The restore.env file replaces the RestoreInProgress Registry key that previous versions of Exchange used.) Then, the agent begins to restore the MDBs to the server database path. By default, all MDBs within an SG reside in a common directory. You can use the Exchange System Manager to manage MDB, temporary file, and transaction log file directories' locations.
The ESE copies the database files to the directory by letting the agent directly open a file handle to the disk and copy the .edb and .stm files. When the copying is completed, the ESE restores the log and patch files. However, because log files for the SG or MDB under restoration might already exist on the server, ensure that you don't restore the log and patch files to the same location. Before the restore begins, the ESE asks you for a temporary location to restore the log and patch files that are associated with the backup. The location you specify, as well as the production database directory location, must have enough available disk space to hold the files restored from backup. In Exchange 2000, as in Exchange Server 5.5, database locations are hard-coded in the log files. However, the ESE's recovery instance replays log files from the temporary location to the databases that reside in the production database directories.
At this point, the ESE recovery instance that launched when the restore began takes an active role. The recovery instance applies the page splits that occurred during the backup to the database files. Then, it processes the log files from the restored backup file set in the temporary location on the server. The recovery instance also processes the current log files. When the ESE restores one MDB into an SG that contains multiple MDBs, the ESE processes only the transactions within log files that pertain to that MDB. After the ESE recovery instance applies the patch files, the backup set of log files, and the current log files to the database, its job is complete and it deletes the restore.env file. The recovery instance will clean up, exit, and return control to the primary SG instance, which then brings the database online. The restore operation is concluded. If restoration involves only differential or incremental backups (log files only), the steps involving database and patch files don't apply.
Support for snapshot technology as a method for backup and restore is a new feature that Microsoft will add to Exchange 2000. Exchange 2000 developers are still deciding how they'll add support for cloning and snapshots (aka Business Continuance Volumes—BCVs) to Exchange 2000. Two types of BCVs exist—BCV clones and BCV snaps. A BCV clone copies data by breaking off a mirror set member in a RAID 1 disk set. A BCV snap makes a point-in-time copy of data by creating a volume block map of the data when the BCV forms. You can use BCVs for backup or as rapid-recovery data sets for application data. Terminology and features vary among vendors' BCV technology products.
Microsoft's support for BCV technology in Exchange 2000 will probably let vendors' technologies easily integrate with Exchange 2000 backup and recovery procedures. Expect Exchange 2000 to include BCV clone and snap support by using the BCV as the medium on which the database files reside. By using the BCV as a restore medium, the ESE will manage the transaction log files that the system needs to complete recovery. In this scenario, the BCV functions as the source of the database files that the ESE recovery instance uses.
Best Practices for Exchange 2000
Because Exchange 2000 is new and not many organizations have fully deployed it yet, suggestions for best practices for disaster recovery might be premature. However, you need to revisit your Exchange Server disaster-recovery procedures as you deploy Exchange 2000, and you need to understand the effect that new features such as multiple SGs have on disaster recovery. You might be able to eliminate support for the Exchange 2000 directory, but don't neglect AD. Also, don't forget configuration management. The complexities of laying out multiple databases and SGs will require a high degree of configuration control. From an operations perspective, understand the scenarios in which you might have to recover Exchange 2000. Practice for catastrophe recovery as well as individual mailbox recovery. Also, make sure to include the complete system in your backup plans and provide for redundant or backup hardware, drivers, patches, OSs, and IIS metabases. If you perform daily full backups with Exchange Server, you'll probably want to continue doing so in Exchange 2000.
Power with Responsibility
With the advent of multiple SGs and MDBs in Exchange 2000, the recovery API stretches to accommodate new scenarios. You can perform backup-and-restore operations concurrently for the entire server, an SG, or one MDB. Exchange 2000 offers a great deal of flexibility and availability that earlier Exchange Server versions don't offer. For example, you can have five SGs that each host 1000 users. You can begin restore operations for one SG or MDB without affecting other SGs. The restore operation would affect 1000 users in one SG, while 4000 other users in other SGs would continue to access their data online. Exchange 2000's multiple-SG feature gives operators more options for allocating data and partitioning that data into manageable units based on performance or disaster-recovery service levels. However, Exchange 2000's complexity also complicates procedures, requires better operator training, and carries more error potential. Gather the knowledge you need to ensure that you implement solid disaster-recovery plans for Exchange 2000.