Recently, on an Exchange Server mailing list I frequent (MS-Exchange Management issues, hosted by Sunbelt Software, http://www.sunbelt-software.com/Communities) we were discussing one of the perennial topics of Exchange administration: defragmenting Exchange mailbox stores. It's well documented that offline defragmentation isn't a maintenance procedure you should perform regularly; for instance, see the Microsoft Exchange Team Blog post "Is offline defragmentation considered regular Exchange maintenance?" msexchangeteam.com/archive/2004/07/08/177574.aspx.
As usual, it took some time in the conversation to convince naysayers that regular defragmentation isn't desirable, if for no other reason than that you have to take your Exchange database offline, which affects end-user availability. There are third-party Exchange management products that automate Eseutil (and provide other functions). It can be difficult to convince people who have spent money on such a product that it may not provide any true benefit to their Exchange environment. You must carefully evaluate the value proposition of such products.
Some of these third-party Exchange management products indicate that the Information Store has errors that the product can resolve. After investigation, it turns out the Isinteg utility is almost always responsible for these warnings. To understand why, you have to look at the big picture—but before you can look at the big picture, you have to know the individual pieces. Let's take a look at the pieces, then put them together to explain. Note that whenever Exchange detects a real error, you'll get a message in the event log of the Exchange server. This has been true in all versions of Exchange, however there is no question that as Exchange has evolved, the product has gotten more and more stable, requiring less and less maintenance.
An Exchange store is just an Extensible Storage Engine (ESE) database. ESE used to be called a Joint Engine Technology (JET) database engine. At one point, JET split into two paths: JET Red and JET Blue. JET Red became Microsoft Office Access, and JET Blue became ESE, which has been a core, built-in piece of the Windows OS in both the client and server versions since Windows 2000. Exchange isn't the only Windows technology that uses ESE; Active Directory, DHCP, WINS, Windows Mail, Windows Desktop Search, and other Windows functionality depend on ESE as well. For more information about the JET engines (pun intended), see the excellent article in Wikipedia (en.wikipedia.org/wiki/Extensible_Storage_Engine).
With that background information, you can see that Eseutil is "ESE Utility." And ESE is just a database engine. So, the operations Eseutil performs are fairly generic; it can do things such as defragmentation, recovery, verification, file dumping, and (scarily) repair. These operations aren't Exchange specific—you might need to perform these operations on any ESE database. Logical operations, at the Exchange level, are handled by Isinteg, which I will discuss shortly.
I'm sure you're wondering, "Do Microsoft SQL Server and other database engines need the same things?" The answer is yes. SQL Server and other database platforms have utilities to perform the same types of operations. However, the end user usually sees a GUI version of those operations, whereas Eseutil pretty much requires that you operate from the command line interface (CLI). The CLI has advantages and disadvantages. Obviously, the learning curve is higher. However, the level of control is also extremely precise. With a GUI, it's rare to have as many capabilities exposed as you'll find in command line tools.
Eseutil operates in six modes: defragmentation, restore, recovery, integrity, file dump, and repair. Restore mode is used for hard recovery, which is a transaction log replay process that occurs after restoring a database from an online backup. Recovery mode performs a soft recovery, which occurs when a database is re-mounted after an unexpected stop, or when transaction logs are replayed into an offline file copy backup of a database. Exchange Server 2007 introduces two new modes to Eseutil, checksum and copy file, which I'll discuss later. For now, let's look at the first six modes.
Defragmentation. Defragmentation mode, which is approximately equivalent to the SQL command DBCC SHRINKDATABASE, is used to remove unused space (referred to as whitespace by Exchange documentation) from an ESE database.
Restore. Restore mode is approximately equivalent to a SQL RESTORE RECOVER command after a RESTORE NORECOVERY. It lets you reload a database from backup, reload transaction logs from backup, then replay those logs to bring the ESE database to a consistent and current state. A consistent state is when, after a database is closed, it does not require any recovery or transaction log replay in order to be safely used again after opening.
Recovery. I don't think there's a precise SQL Server equivalent of the recovery mode; SQL Server does this against every database when it's started. Recovery occurs when a reboot happens or a store is freshly mounted. Eseutil makes sure the database is current against the available transaction logs and in a consistent state. Typically, Exchange does this automatically when mounting a store, but Eseutil provides you a mechanism to do it manually without starting the Exchange Information Store service. This mechanism gives you a fine level of control when replaying transaction logs.
Verify. The verify mode, approximately equivalent to a SQL RESTORE VERIFYONLY command, lets you verify that a database backup is good and that a repair operation isn't required. It's effectively a read-only version of the repair command.
File dump. Approximately equivalent to several SQL DBCC informational commands, file dump mode lets you examine certain internal structures of the Store that might be useful in debugging, recovery, or crash-analysis situations.
Repair. Approximately equivalent to the SQL DBCC CHECKDB command, followed by various DROP statements for corrupt tables, indexes, constraints, views, and so forth, repair mode lets you get some data out of a corrupt database. The amount of data lost with a Repair operation is dependent on the particular pages of the database which are corrupt. The data lost can be anything between nothing and everything.
Except for repair mode, none of the Eseutil operations are lossy operations—that is, you won't lose data. With repair, it's possible to lose data. The repair operation treats an Exchange store as just another ESE database and attempts to recover as much data as possible. After you run repair, you need to apply the necessary application constraints and changes to the Exchange store before Exchange can use it. That's the job of Isinteg.
It’s a common comment in the various Exchange communities that Repair is misnamed. The operation should be called Salvage instead.
Backup and Replication
Before Exchange Server 2003, the only way to completely back up an Exchange store was with the streaming backup API. This is the mechanism used by NTBackup and most third-party agents that perform Exchange backups. Some third-party backup applications, such as Symantec BackupExec and UltraBac, also provide mailbox backup or item-level backup; this is a MAPI-based item-by-item operation, is very slow, and isn't usable in a disaster-recovery situation because it backs up only mailboxes and their contents, not store-level information.
Beginning with Exchange Server 2003, you can perform snapshot backups when running on Windows Server 2003 or later. A snapshot backup uses Microsoft Volume Shadow Copy Service (VSS). Basically, the Exchange Store temporarily flushes its buffers to disk to allow the OS time to make a block-level copy to another file or volume. (That's a gross oversimplification, but you get the basic idea.) In fact, on Windows Server 2008 and Exchange 2007, the native backup application no longer supports streaming backup. A VSS copy or a third-party backup application is required for backing up the Exchange stores.
One of the things a streaming backup does is verify the Store. Every record that's read from the Store and written to the backup destination is checksummed and validated. Any time that Exchange reads or writes a record to the store, it automatically checks the checksum. If an error occurs during that validation, the backup is aborted. Any errors that Exchange detects during backup will cause the backup to be aborted. However, because VSS works at the block level and not the Exchange record level, a mechanism was needed to perform this same check on a VSS backup of the store. Thus the checksum option for Eseutil was created. It performs that validation on a quiescent copy of an ESE database. A VSS backup won't return "success" until the checksum process is complete. Note that this also applies to copies of databases created by Exchange 2007's cluster continuous replication (CCR), local continuous replication (LCR), and standby continuous replication (SCR).
Most likely you'll need to copy VSS backups of databases quickly to secondary storage media. The standard Windows copy commands are no speed demons, so the copy file option was added to Eseutil. It copies large files very efficiently and quickly—much faster than the normal Windows copy commands, by some estimates (depending on the environment) between 20 and 40 percent. However, everything we've discussed here is generic to ESE files; none of these procedures are specific to Exchange.
Exchange began as an internal Microsoft email platform that supported only a small number of users (visit msexchangeteam.com/archive/2008/01/02/447806.aspx for more information). After developers were able to improve the product's performance and scalability, it was released to the public, but it still had significant limitations in comparison to today’s product. And, unfortunately, ESE was somewhat buggy.
Isinteg (short for Information Store Integrity Checker) was originally developed as a debugging tool. Eseutil told you whether a database had integrity from an ESE perspective, but Eseutil wasn't Exchange aware (at that time, Eseutil was named Edbutil). Exchange stores have a specific format—a schema—as does any other database. (Microsoft doesn't publish the specific schema for Exchange; only generalities and certain specifics—such as those displayed by Isinteg and those used in certain instructional classes—are public.) Exchange stores also have a specific set of internal referential integrity checks. For example, there are two message roots, and each mailbox has a specific identifier and requires certain specific subfolders, not to mention the hidden folders and indexes. People supporting Exchange stores needed a way to tell whether a particular ESE database adhered to the requirements of an Exchange store—thus, Isinteg was born.
Isinteg, following the philosophy of "any data is better than no data," was given the ability to truncate pieces of an Exchange store that it found to be corrupt. Depending on what pieces end up truncated, a user might lose data or access to a mailbox. In the worst case, the mailbox directory itself could be corrupt. If Isinteg found it necessary to delete that directory, then all the data in the Exchange store could be lost; this is why there are two message roots.
Eventually, Isinteg made it into the standard Exchange distribution because it had a couple of very useful capabilities. In versions of Exchange up to and including Exchange Server 5.5, you could use Isinteg to "patch" the Store so that you could recover an Exchange database from an offline backup or move a copy of an offline backup from one server to another (with some specific limitations, of course). You could also use it to analyze problems in the Exchange tables.
The patch capability was removed from Isinteg in Exchange 2000 Server, along with the distinctions between public and primary stores. However, Isinteg can still analyze all Exchange tables and determine whether Exchange referential integrity is preserved. Using Isinteg with the -fix switch can fix broken referential integrity or create it if it doesn't exist, but it has the potential for data loss. As I noted previously, Isinteg deals with the database at a logical level, while Eseutil deals with the database at an EDB level. If Isinteg -fix detects a missing table of contents for a folder, it might orphan those records, effectively losing that data.
Most of the information that Isinteg typically spits out is just a bunch of warnings. You get a warning, for example, whenever the count of items in a folder doesn't match the actual number of items in the folder. In ESE, these counts are recalculated only when necessary, so this isn't considered an error. You also get a warning when an item has been deleted but the parent index hasn't been updated to remove the pointer (delaying these kinds of updates can also be an efficiency optimization—by not executing a physical I/O operation until it is actually required). Isinteg can spit out dozens of other warnings, none of which cause Exchange to stop using the Store or are indicative of a significant problem.
However, if Isinteg finds an actual error (which is marked as an error in the Isinteg output), you've really got a problem—your database is corrupt. That does not mean that running -fix is the best option. In general, if Isinteg finds an error, you can expect that the verify mode of Eseutil will find the same error. It's probably an ESE error rather than an Exchange error.
The primary use for Isinteg today is to follow up after an Eseutil repair operation. After you complete a repair with Eseutil, Isinteg will build the necessary pieces for Exchange to use the repaired store with the information left in the store. Then you should create a new store and move the mailboxes to it as quickly as possible. A repaired store is "known bad" and should not be used in production.
Pulling It All Together
Exchange provides a resilient transaction-based database for its stores. Those databases are reliable, and every successful backup verifies that an Exchange store isn't corrupt. If the backup logic discovers that a store has become corrupt, an entry is logged in the Application event log and the backup aborts.
In the case of an Exchange store failure, you can choose from three primary methods of recovery (not including replication):
- Exchange built-in recovery, including replay of transaction logs (soft recovery)
- Exchange reload recovery, including replay of transaction logs (hard recovery)
- Exchange repair recovery, followed by Isinteg to rebuild Exchange tables and indexes (catastrophic recovery)
There are not any documented uses of Isinteg with -fix for current versions of Exchange (Exchange Server 2007 and Exchange Server 2003, both at current service packs) except following an Eseutil repair operation. You should never run disaster-recovery tools as part of normal maintenance.
If a backup aborts, but you experience no other errors when using a particular store, the current recommended recovery process is to bring up another store and move all the mailboxes to the new store. When the old store is empty, dismount it and delete it. This method involves zero downtime.
If you're on Exchange 2003 Standard Edition (and therefore don't have the capability of bringing up another mailbox store), the current recommended recovery process is to first attempt an Eseutil defrag to see if that fixes the problem. If it does, you're home free. If it doesn't, and you have the server resources available, consider standing up a second Exchange Standard server, moving all the mailboxes to the new server, then decommissioning the old server. If that isn't possible, then and only then should you attempt an Eseutil repair followed by an Isinteg -fix.
As I noted before, after you've repaired a store you should move all the mailboxes off of it into a fresh store as soon as possible, using one of the options I recommended. I hope this information gives you some further insight into how Exchange maintains, validates, and checks the integrity of its databases.