Inside the Exchange Information Store

Enter this unique database structure and discover what makes Exchange work

Relational database technology lies at the heart of Microsoft Exchange Server. At its simplest, a relational database stores information in tables and uses matching values in the tables to relate information between the tables. When you understand Exchange Server's database technology, you can head off problems and optimize performance. Over the next two months, I'll take a detailed look at the databases that make up the Exchange Information Store. This month, I'll focus on the databases' internal structure, how they process transactions, and what happens when problems occur. Next month, I'll show you how to maintain the databases in the Information Store and maximize their performance.

Exchange Databases and the Information Store
Mailboxes, public folders, and directory information in Exchange are contained in three databases. The private information database holds user mailboxes, the public information database holds public folders, and the directory database holds directory and configuration data. One Windows NT service, the information store service (store.exe), manages the private and public information databases, which together are known as the Information Store. The directory and Information Store contain many files that are important to Exchange administrators. Table 1, page 170, lists some of these important files.

The Exchange database engine. Two versions of Microsoft's Jet database engine manage the Exchange databases. Jet Blue manages the databases in Exchange 4.0 and 5.0, and the Extensible Storage Engine (ESE or ESE97) manages the databases in Exchange 5.5. Microsoft designed both database engines to handle the transaction load Exchange's messaging system generates. Jet Blue and ESE run inside store.exe as either edb.dll or ese.dll, depending on the version of Exchange. The essential features of the Information Store do not change between Exchange 5.0 and 5.5. However, the ESE database engine in Exchange 5.5 is faster and more scalable than Jet Blue. When I refer to the Exchange database engine in this and next month's articles, I'm referring to the ESE engine in Exchange 5.5.

Exchange does not use SQL Server, Microsoft's best-known database, because SQL Server handles commercial transactions, which are usually consistent in form. Messages in Exchange vary in content and length, and this variation challenges a database. For example, some messages in Exchange go to single recipients and contain a few lines of text, whereas other messages go to many addresses and contain several pages of content and sometimes attachments. SQL Server's database design cannot process messages in Exchange effectively.

Single-Instance Storage Architecture
Exchange uses a single-instance storage model to process messages. Single-instance storage stores one copy of a message routed to multiple recipients in the Information Store and deposits a pointer to the message in each recipient's mailbox. The single-instance model is different from the classic LAN-based design for email, which sends separate copies of one message to each of the message's recipients. The LAN model works well for small installations because it doesn't incur the overhead of a database, but it doesn't function effectively if the number of users rises to more than 100.

Increasing the effectiveness of single-instance storage. Single-instance storage is server-specific. The Information Store transfers messages between Exchange mailboxes that reside on one server. The Exchange Mail Transfer Agent (MTA) transfers messages between the mailboxes on different Exchange servers. Transferring messages to users on multiple servers increases data duplication and network traffic, and it can hinder Exchange's scalability. You can optimize the efficiency of single-instance storage in Exchange by ensuring that all members of a workgroup or department have mailboxes on one server.

Relational Tables Inside the Private Information Store
The internal structure of the public and private information stores and the directory is similar to the structure of a classic relational database; that is, the structure consists of tables. Let's look at how these tables function and interact in the private information store, where Exchange processes email.

The most important tables in the private information store are:

The mailbox table: One row in this table holds properties for each mailbox on a server.
The folders table: Each folder in every mailbox has a row in this table.
The message table: One row in this table holds content of every message.
The attachments table: One row in this table holds the content of every attachment.
A set of message/folder tables: Each folder has its own message/folder table.

Pointers link one table to another within the private information store. Single-instance storage is based on the interactions between pointers and tables, and these interactions let Exchange deliver a unified view of the private information store's contents to clients.

Exchange supports nested folders, in which folders contain subfolders. Client machines construct a tree view of the folders in the private information store by reading the data in the store's folders table. Each folder has a globally unique ID (GUID). Each subfolder has a GUID and also carries its parent folder's ID, which identifies the subfolder as a subfolder. The sample data in Table 2 shows that the Articles and Newsflash folders are both subfolders of the Magazine folder.

When an Outlook client opens a user's mailbox, the count of new items column in the folders table alerts the client to highlight the folder name in the user's Inbox. The highlighted folder name signals the user to review the contents of the folder.

Each folder table has its own message/folder table that contains header information (all of which are Messaging API--MAPI--properties) for all the messages in the folder. By maintaining a folder's message header information in a separate message/folder table, Exchange lets each folder sort its own messages. (Table 3 shows the contents of a message/folder table.) With message/folder tables, folders do not need to request individual message data from one large table. Exchange's message/folder table system minimizes data transmission between client and server when clients display information about a folder.

The message ID (another GUID) links each row in a message/folder table to message content. When a user selects an email item in the Inbox and double-clicks the item to read it, the message ID retrieves the message content from the message table and uses the combination of header and content information to populate the form that displays the complete message for the user. (Table 4 shows the type of data a message table contains.) Message body content is stored in Rich Text Format (RTF). If the message contains an attachment, Exchange places a pointer to the attachment in the Attachment Pointer field, and the client can use the pointer to retrieve the attachment. The Usage Count field contains a count of all the folders that contain a reference to a particular message. The folder count in the Usage Count field decreases as users delete references to the message. When the usage count reaches zero, Exchange removes the message row from the message table.

In Screen 1 you can see a typical email message. Exchange took data from the Information Store's relational tables and used the process just described to construct this familiar user interface.

The Exchange Transaction Model
In Exchange, all transactions consist of a series of multiple operations against different tables, and Exchange will not accept a transaction unless that series of operations is complete (such indivisible operations are known as "atomic operations"). Let's review what happens when a new email message is delivered to four users on one server.

First, Exchange reads the mailbox table to verify that mailboxes exist for each of the message recipients. Second, Exchange reads the folders table to locate the rows that correspond to each recipient's Inbox. Third, Exchange updates the message table by adding a new row that contains the content of the new message. Fourth, Exchange updates the message/folder table for each of the four Inbox folders, providing header information for the new message. Pointers link each message/folder table with the new message's row in the message table. If the new message contains attachments, Exchange adds rows corresponding to the attachments to the attachments table. Fifth, Exchange updates the Inbox row in the folders table for each user by adding the new message to the Count of Items and Count of New Items columns. At this point the transaction is complete.

Exchange saves each transaction by writing it to the current transaction log (edb.log), and also to a queue in memory. The Information Stores manipulates the contents of the memory queue to carry out the eventual writes to the database in the most efficient manner. For example, if system load is heavy, Exchange regularly updates the rows for Inbox folders in the folders table. By referring to the memory queue, Exchange can organize the transactions and commit changes for Inbox folders in one operation. Exchange always gives client interactions higher priority than background processing, so when system load is heavy, transactions can build up in the memory queue. Exchange then flushes the transactions from the memory queue when the load on the system decreases.

Client notification of a transaction occurs after the transaction is completed. MAPI users receive a remote procedure call (RPC) notification. Other users, such as those using Outlook Web Access or Post Office Protocol 3 (POP3), must check the Inbox at intervals to find new messages.

The Exchange Information Store is therefore composed of databases, transactions in memory, and logs of those transactions. For an administrator to focus exclusively on managing the databases would be a mistake. In an Exchange operational environment, you must manage the databases, transactions in memory, and transaction logs as a single entity.

In Exchange, transaction logs are a crucial element, the first port of call for any change an item stored in an Exchange database undergoes. If an error occurs and results in system memory loss, the data in the transaction logs lets Exchange recover all transactions that were not committed to the database at the time of the memory loss. Keeping transaction logs on the same disk as the Exchange databases is risky: If you lose your Exchange databases, you'll also lose the transaction logs that let you update a backup copy of the databases.

A transaction log always holds 5MB of data, and Exchange writes to a transaction log sequentially, appending data to the end of the current log. Individual messages containing more than 5MB of data span multiple transaction logs. When one transaction log is filled, Exchange creates a new file to hold the data for new transactions while it renames the filled transaction log. As Screen 2 shows, filled transaction logs (always named edb) are renamed edbxxxxx ("xxxxx" represents a hexadecimal number) and become database files. After a filled transaction log is renamed, Exchange names the temporary holding file edb, and this file becomes the current transaction log. When Exchange commits the data from a transaction log to the database, it advances a checkpoint in the transaction log to the point where it committed the data to the database. In recovery situations, Exchange uses the checkpoint to locate data in the log that must be written to the database.

Transaction logs include interpersonal messages and replication traffic for public folders and the directory. A lightly used server might generate only two or three logs a day, but systems that support thousands of mailboxes can generate up to 2GB of transaction logs a day. You must place Exchange transaction logs on a disk that is unlikely to run out of space. If transaction logs exceed available disk space, the information store service will terminate. By convention, Exchange creates transaction logs for the Information Store in a directory called \MDBDATA and creates transaction logs for the directory store in \DSADATA. You can move these directories to other disks through the Database Paths property page for a server, as Screen 3 shows.

Exchange automatically deletes transaction logs when you make a full online backup. You can delete the logs manually to free disk space by shutting down the information store service (doing so will flush all transactions from memory) and then making a file-level backup. Don't delete a log unless you are certain you'll never need to recover the transaction data it contains. In other words, make sure you have a successful backup of the Information Store before you delete logs.

Two logs--res1.log and res2.log--are empty 5MB files that Exchange reserves to use to capture data if the disk holding the Information Store runs out of space. Well-managed servers should never run out of space, but in the event a server does so, the information store service will conduct an orderly shutdown. During such a shutdown, Exchanges uses the space in the reserved logs to extend the Information Store or create a new transaction log.

Hard and Soft Recoveries
Any transactional-model database needs to be able to recover from interruptions in service, and Exchange responds to hard and soft recovery situations. Hard recoveries occur when you replace a disk and recover its data from a backup. Soft recoveries happen automatically when the information store service terminates abnormally (e.g., someone turns off a server without going through the normal shutdown process), in which case some transactions are not committed to the database.

Imagine what happens in the example above, when someone shuts off a server without going through a normal shutdown. When the server shuts down, all services stop, including the information store service. At this time, Exchange commits outstanding transactions in memory to the database. On heavily used servers, the process of committing outstanding transactions to memory can take several minutes, and people sometimes think the server is hung when nothing appears to be happening during an extended shutdown.

When the server restarts, the information store service executes a soft recovery by consulting the checkpoint stored in edb.chk to see whether any transactions remain outstanding. If edb.chk is not present, the information store service reviews the contents of the current transaction log, edb.log, to verify that the system is running normally. Exchange replays outstanding transactions and commits them to the database before it lets clients connect. The time a system takes to recover a transaction log in a soft recovery depends on the speed of the system CPU and disk I/O subsystem, but even the smallest system should be able to recover transaction logs at a rate of at least one log per minute.

A soft recovery is an automatic process that requires no intervention. Many system administrators are probably not aware when soft recoveries occur, although you can track soft recoveries if you look for them in the application event log. As you can see in Screen 4, Exchange recovers transactions for the directory store by using the same mechanism it uses to recover transactions for the Information Store.

In contrast to soft recoveries, hard recoveries require administrator intervention. Therefore, after you replace a hard disk, you must restore the last full backup and any incremental backups that have occurred, including all transaction logs generated since the last full backup. After you restore the Information Store and all transaction logs, you can restart the information store service, which replays the transactions in the logs. The information store service replays the transaction logs sequentially, so you must restore all the logs. If you don't restore all the logs, the replay procedure will terminate before it recovers all transactions. After the replay procedure completes, stop the information store service and run the ISINTEG utility with the PATCH switch, which instructs ISINTEG to adjust the GUIDs inside the store. Take a full backup after you run the ISINTEG utility. The entire procedure can take several hours, so protect yourself during the recovery process by putting the Information Store on a RAID 5 array, preferably one that supports hot-swappable disks.

The circular logging process in Exchange helps small servers conserve the disk space that transaction logs consume. In circular logging, Exchange uses a set of between one and four transaction logs after it commits the transactions they contain to the database. If the process runs smoothly, circular logging never causes a problem. But life, computer systems, and especially hardware tend to hit rough patches from time to time. If you lose a database through hard-disk failure, the combination of transaction logs and a recent backup is enough to get the system back online without data loss. But if the transaction logs are recycled logs, Exchange cannot replay some transactions from them after a hard-disk failure. Therefore, in recovery situations when circular logging is enabled, you can recover from recycled logs only those transactions your last full backup saved.

Circular logging is enabled by default on all Exchange servers, and you should disable it as soon as possible after you install a new mailbox server, unless you want to use the server for testing. For example, you might keep one server for installing new service packs and set it up for circular logging. Servers that act as message switches don't hold much data in the Information Store, so you can enable circular logging on them, too. However, don't expose yourself to the risk of data loss by running circular logging on production servers. As you can see in Screen 5, you disable circular logging separately for the information and directory stores.

Upgrading to the Unlimited Store
Exchange 5.0 and 5.5 use the same essential file formats and database schemas for the Information Store. Exchange 5.5 can let the Information Store grow beyond version 5.0's 16GB limit, but it does so invisibly. Exchange builds the Information Store in a set of pages, each of which contains 4096 bytes. The Exchange engineers changed the size of the page pointer in Exchange 5.5 to increase the size of the Information Store, which is now limited only by the amount of disk space a server's I/O subsystem can handle (the theoretical limit is around 16TB). The 16GB limit remains for Exchange 5.0 and earlier versions.

Disk space is the obvious limitation to Exchange 5.5. A system's ability to back up massive stores and, when necessary, to perform a restore, is a less obvious but still very real limitation. If a database grows to 100GB and the system's backup device can write data at 4GB per hour, the backup time becomes an unrealistic 25 hours. Restoring usually takes longer than backing up, and contemplating a 25-plus-hour restore operation is enough to convince even the most intrepid system administrators to invest in the fastest backup device they can lay their hands on. Exchange 5.5 is faster at backup and restore operations than previous versions of Exchange, but you cannot take advantage of this speed if your backup devices are slow.

Some people question the wisdom of using one large database to hold the contents of all mailboxes on a server, as Exchange does, and they wonder why Exchange doesn't split the Information Store across multiple physical files. After all, the Information Store is already logically divided across two databases, priv.edb and pub.edb, so why shouldn't the store be split? The advantages of splitting the Information Store's databases include easier and faster backup and restore operations, as well as the ability to divide the I/O load the store generates across multiple disk spindles or controllers. However, these advantages must be weighed against the inevitable loss of single-instance storage if the Information Store's databases were distributed across multiple files. You can see how efficient single-instance storage is as it updates rows in database tables when new messages arrive. But if Exchange distributed mailboxes across multiple databases, messages would not be delivered or logged as efficiently. Will the Exchange engineers scuttle the single-instance storage model? Monitoring this debate will be interesting as very large Information Stores build during the next couple of years.

What happens when the limit is reached? All databases have size limits. If a store reaches a logical or physical limit (i.e., no disk space is available for either the store or transaction logs), the information store service conducts a controlled and orderly shutdown in which the following processing occurs:

When the store reaches its size limit, the store cannot expand. Users will receive error messages when they try to add items to the store. The usual error message is "Network problems are preventing connection to the Microsoft Exchange Server." This generic message masks what is really happening.
Exchange sends event number 1112 for the Information Store to the NT Event Log. This event number states that the store has reached its maximum size and is stopping.
Exchange stops the information store service. Transaction logs hold the data for messages that cannot be committed to the store.

You can restart the information store service. However, when you do, users cannot add items to the store. Users can delete items, but doing so will not shrink the database or eliminate pages within the Information Store. After you restart the Information Store, you can move mailboxes to another server and let users work again. You can also delete mailboxes, but again, this action will not reduce the size of the Information Store. Exchange stores transactions that occur after you restart the Information Store in transaction logs, but it does not commit them to the database. You can observe the transaction logs growing, but the size of the Information Store will remain static.

The only way you can get the Information Store back into normal operation after it shuts down because it has reached its size limit is to take it offline and compress it. Compressing the Information Store should make it smaller. (I'll discuss compressing the Information Store in more detail next month.) Then take a backup and restart the information store service. Exchange will then commit all outstanding transactions to the database, which could take a few minutes to complete if it must process several logs.

Tune In Again Next Month
We covered a lot of ground this month looking at the Exchange Information Store's structure, discussing how transaction logging works, and understanding what happens when problems occur. Next month I'll show you how to maintain the Information Store and get the most from its unique capabilities.

Comments

Plain text