If you want to start an argument with a Microsoft Exchange Server administrator, try giving unsolicited advice about storage design and configuration. Exchange 2000 Server and Exchange Server 2003 offer so many storage options that determining an optimal configuration is often difficult. In addition, some common and enduring Exchange misconceptions further complicate the decision-making process. In this article, I discuss some lesser-known Exchange storage design principles that will help you clarify what works so you can make the best design decisions for your environment.
Exchange Server 5.5 uses a monolithic database design, with a maximum of three databases on each server: a mailbox database, a public folder database, and a directory database. This design allows some truly scary configurations; for example, I once had a customer with an average Exchange 5.5 mailbox database size of 140GB.
Exchange 2000 introduced the concept of multiple storage groups (SGs), each of which can contain multiple databases; Exchange 2003 uses the same mechanism. The SG (a logical object that doesn't exist on the hard disk) is an instance of the Exchange Information Store (IS) that runs within the store.exe process and owns the transaction logs for all the mailbox and public folder databases in the group. Each database is a separate logical object with a pair of physical disk files (the .edb and .stm files). Many customers who upgrade from Exchange 5.5 to Exchange 2000 or Exchange 2003 accept the default migration settings. This practice isn't advisable because you get Exchange 5.5's huge single database rather than having the benefit of multiple databases.
You can back up or restore only one database at a time per SG. If you have multiple SGs, you can back up or restore multiple databases simultaneously. Suppose you have a 140GB database that's divided into four SGs, each with a 35GB database. Backing up this divided database takes the same total amount of time as backing up one 140GB database; however, the individual backups take about a quarter of the time. If you back up to tape, you can add a second tape drive to back up two databases in parallel and cut the backup time in half. But the biggest performance gain occurs if you restore multiple databases in parallel. If you've backed up multiple SGs, you can restore one database from each SG at the same time and significantly reduce the overall restore time.
Another good reason to partition your storage is that doing so can help you abide by your service level agreements (SLAs). Suppose you have an SLA that requires you to restore executives' access to email within an hour of an outage but gives you a five-hour window for other employees. If you put the executives' and employees' mailboxes into separate SGs, you can restore the databases independently. Assuming that you have fewer executives than employees, you should be able to restore the executives' email access according to your SLA.
Microsoft's recommendation with the initial release of Exchange 2000 was to create the smallest number of SGs possible because each additional SG required a fixed allocation of between 100MB and 250MB of RAM—a significant amount at the time. Exchange 2000 Service Pack 3 (SP3) includes RAM allocation process modifications that dramatically reduce the amount of RAM required for additional SGs. Now, Microsoft's recommendation is to create as many SGs as possible. To draw on my earlier example, Microsoft recommends creating four SGs with one database each instead of one SG with four databases because each Exchange SG has its own set of logs. If you have only one database per SG, each database essentially has its own set of logs. This configuration simplifies and expedites disaster recovery because only one database's transaction logs must replay when you restore the database.
Once upon a time, administrators debated whether using RAID with Exchange was a good idea. That debate has long since been put to rest; administrators know that RAID can add a valuable degree of protection to Exchange data. Now the debate has turned to the type of RAID to use.
To determine which type of RAID to use, you need to remember that each RAID level balances performance against recoverability. What's good for one data type can be bad for another. Imagine a striped volume with two disks. Striping gives you great speed because applications can read from and write to all physical disks at the same time. But if you lose one disk in the stripe set you effectively lose the whole volume. This design might be acceptable in situations in which the performance boost would be beneficial but a transient disk failure wouldn't be the end of the world (e.g., for SMTP queues on a gateway machine). However, you'd have to be fairly risk-tolerant to put your databases on such a volume.
Microsoft's general recommendation is to use mirroring for data when protection is most important (e.g., transaction logs, the system volume). When data protection and access speed are both important, use either RAID 5 or RAID 0+1. If you have the budget, RAID 0+1 is preferable.
Logs and Databases
When you install or upgrade Exchange and accept the default log and database locations, all your Exchange data is stored on one volume. However, Microsoft has long recommended that you put transaction logs and databases on separate volumes because of the differences in their access patterns. Log files are always written to sequentially, and they're read (also sequentially) only during log playback. Databases are written to and read from in essentially random patterns, according to users' requests. Thus, putting your log files and databases on the same disk volume is a bad idea for two reasons: Doing so impairs performance and can compromise your ability to recover data in the event of a disk failure. These risks are present even if you're using RAID arrays instead of plain physical disks. Consider a case in which you have one large RAID 5 array with 10 disks that contains transaction logs and databases for two SGs. A better configuration from a performance and disaster recovery standpoint is to use two disks to make a mirrored volume for the transaction logs, dedicate seven disks for a RAID 5 array for the databases, and keep one unallocated disk as a hot spare. Depending on the database access patterns, you can also create separate RAID 5 volumes (each with its own set of physical disks) for the databases.
Keep in mind that online full backups remove the transaction logs. If you see a lot of log files after a backup finishes, you need to investigate because the backup wasn't successful. Never manually delete transaction log files without a good reason for doing so—such as if Microsoft Customer Service and Support (CSS) advises you to. Even then, you need to ensure that you have a current copy of the logs stored in a safe location before you delete them.
Circular logging is a poorly understood Exchange feature that many administrators dismiss as inherently risky—with good reason. When you enable circular logging, Exchange limits the total amount of disk space it uses for transaction logging by overwriting previously created transaction logs after a backup finishes. Although this method makes sense in theory, in practice it means that you might not have a complete set of logs for a particular database—which means that you can't fully recover the database in the event of a failure.
A couple of situations exist in which you might want to enable circular logging for an SG. One case is on front-end SMTP servers. The Exchange 2003 SMTP service requires that you have a mailbox database mounted so that it can generate nondelivery reports (NDRs). Over time, that database accumulates transaction logs unless you enable circular logging. Another situation that calls for circular logging is if you're performing a task that generates a high number of transactions. For example, suppose you need to move 500 mailboxes from one server to another overnight. Doing so will generate a volume of transaction logs on the two servers approximately equal to the volume of mail being moved—a substantial amount. To solve this problem, perform full backups of both servers, then enable circular logging on the SG from which you're moving the mailboxes. You might also want to enable circular logging on the target server, although leaving it off is the safer option. In most other situations, leave circular logging off to avoid overwriting transaction data that you might need later.
SANs offer some powerful Exchange storage benefits. Having a flexible storage system that lets you allocate and reallocate space on the fly is useful in itself. In addition, being able to logically reassign and move volumes between hosts lets you take full advantage of clustering, point-in-time copies, and other technologies that depend on or benefit from SANs.
However, SANs also introduce some additional storage variables, most of which are specific to the SAN vendor. For example, the way in which a SAN controller allocates physical disks to logical volumes varies among vendors. Some vendors suggest that when you create an aggregate volume, you put as many disks in it as possible; others don't. Some vendors have transparent support for reallocating storage on hot spots; others don't. You need to know and understand your SAN vendor's recommendations for how to allocate and provision Exchange storage. A strategy that works well for one SAN vendor's system might not work for another's.
In general, vendor recommendations follow the Microsoft recommendations I've discussed. The most common discrepancies are regarding the size and type of disks to use for a particular configuration, and RAID design and allocation (e.g., Network Applicance's—NetApp's—filers typically use RAID 4 rather than RAID 5). Before you commit Exchange mailboxes to a SAN setup, use the Microsoft Jetstress tool (jetstress.msi) to evaluate the setup's performance. To download this tool, go to http://www.microsoft.com/downloads/details.aspx?familyid=94b9810b-670e-433ab5ef-b47054595e9c&displaylang=en. In addition, involve your SAN vendor's engineers to ensure that your SAN design and layout are appropriate for your Exchange requirements. Doing so will help protect you against unpleasant and potentially expensive surprises as your storage requirements grow.
Make the Best of Change
Exchange storage technologies have changed significantly in the 10 years since Exchange first shipped. Knowing how to make the best of those changes will help you effectively design and operate an Exchange system that offers superior reliability and performance. For more information about Exchange storage and design, see the resources listed in the Learning Path box.
Use this checklist to create an efficient Exchange storage environment.