Do you know how to repair your database? Microsoft provides several utilities in Exchange Server for checking or repairing the Extensible Storage Engine (ESE) databases. You shouldn’t rely solely on these tools, and you should use them only with the help of Microsoft Product Support Services (PSS). This week, let’s look at physical corruption of the Exchange Server database and the tools available to find and recover from this problem. I’ll tackle logical corruption next week.
After Exchange Server passes data to the OS, it relies on the OS, device drivers, and hardware to preserve that data. However, these lower layers don't always provide flawless data protection. Physical corruption of the Exchange Server database is the most severe form of failure that you can experience because you can't repair the data, and you must recover it using your established disaster recovery measures such as restoring from tape backup. When Exchange Server detects a physical corruption, it logs the error to the application log in Windows 2000 and Windows NT. Most often, Exchange Server encounters these errors during online backup or database maintenance because the database engine checks every database page during these operations. Each 4KB database page contains a 40-byte header with information about the page. The header stores both the page number and a checksum or cyclical redundancy check (CRC). When the ESE reads a database page, it first checks the page number in the header to be sure that this is the page it requested. Next, the ESE validates the CRC. If either an invalid page number is returned or the CRC fails, the database engine reports an error. ESE attempts to re-read and check the database page up to 16 times before declaring the page corrupt and logging the error (Exchange 5.5 Service Pack 2—SP2—and later). In the event that this occurs, 200/201 series errors are logged to the application log indicating that either the database engine encountered a bad page but retried successfully or that it retried 16 times without success and you must recover the database. If this error occurs during online backup, the operation is terminated to ensure that you don't back up a corrupted database.
To remove physical corruption, ESEUTIL is your only option. ESEUTIL finds bad database pages and removes them from the database. If the bad page contains only data, then that data is lost and might need recovery. However, in a more serious case, the page might contain a B-tree index with database structure information. The impact of this lost page could render a large part of the database useless without a complete restore. In most cases, you'll probably have to restore the entire database when you encounter physical corruption. In addition to regular online backups using the API, you can check for physical corruption of your Exchange database using ESEUTIL or ESEFILE (ESEFILE has been available since Exchange 5.5 SP3). Both ESEUTIL and ESEFILE let you perform a complete check of the database for physical corruption. Both tools require that the database be offline (Exchange services shutdown), and the fastest tool—ESEFILE—does checking only.
The best protection from physical corruption is rock-solid hardware, firmware, and device drivers. However, you still might encounter this problem, and you need to be up-to-speed about recovery options. If you are performing online backups using the API (recommended), Exchange will let you know via the event log if it encounters corruption. Make sure you know the available tools (ESEUTIL and ESEFILE) and how to use them when physical corruption occurs.