In "Scaling Out OWA," June 2003, http://www.exchangeadmin.com, InstantDoc ID 38528, I discussed some of the design decisions that I made before building a messaging data-center platform that supports 350,000 Microsoft Outlook Web Access (OWA) users. As promised, I now describe the specific requirements of and solutions for each server in that project, as well as some of the storage-configuration details and other more subtle aspects of server tuning that I considered.
Designing storage for 350,000 users in a front-end/back-end server environment requires a mix of approaches specific to the involved server types. Some servers have high I/O throughput requirements, others have low I/O requirements, and others have high storage-capacity requirements. Many of the ideas I explore in this article build on concepts and assumptions I developed in "Scaling Out OWA," so be sure to read that article first.
I derived the front-end server hardware configurations in "Scaling Out OWA" from benchmarks, but you can apply some general rule-of-thumb guidelines. Front-end server memory consumption is generally about 40KB per active OWA client connection. If you support no more than 20,000 active users (as workload assumptions defined in our environment), this memory consumption yields a nominal aggregate memory requirement of 780MB.
Each front-end server is configured with 2GB of memory, so setting aside 1GB for the OS, Exchange 2000 Server, and third-party applications leaves three servers running at approximately 30 percent memory usage for the OWA workload. In the event that a front-end server fails, this configuration provides sufficient redundancy to cope with the displaced workload. OWA clients should never explicitly specify a particular front-end server to which to connect. Rather, users should always connect to a generic URL that provides hardware-based load balancing—or, less preferably, Windows NT Load Balancing Service (WLBS)—for the front-end servers. That way, a failure of one or more front-end servers won't affect the service.
Benchmark testing for this configuration shows that front-end servers exhibit a relatively light CPU load when they support a 20,000-user workload distributed across seven back-end servers. A CPU workload shouldn't exceed 20 percent—even at peak times when many users are logging on and initially accessing their mailboxes. While under load, front-end servers have almost nonexistent I/O subsystem requirements. The front-end server's function is solely to proxy traffic to back-end servers, so no data is written to disk for databases or transaction logs unless you're running SMTP on the front end, in which case you must mount a private store for nondelivery report (NDR) delivery. Accordingly, you don't need to provide any explicit Exchange storage on front-end servers.
As a best practice, configure front-end servers with one logical volume, comprising two 18GB 15,000rpm disks that use RAID 1 mirroring, with an effective capacity of 18GB. Always use Direct Attached Storage (DAS)—not Storage Area Network (SAN)based storage—for front-end servers. DAS is the most cost-effective way to provide small volumes of storage for servers. In addition, the performance of locally attached storage is generally better than SAN-attached storage for I/O write operations, particularly in terms of latency.
Benchmark testing shows that seven back-end servers are necessary—for various reasons, including database size, CPU load, and so on—to support a maximum load of 20,000 active users. This number yields a load of approximately 3000 users per back-end server. Although no particular rule of thumb exists that associates numbers of active users with amounts of memory on Exchange 2000 back-end servers, I recommend memory configurations of around 3GB for servers with this type of user load.
Each back-end server consists of four 1.6GHz processors. Such a server will comfortably support a processor workload of approximately 3000 OWA users. Benchmark testing generated the loading characteristics that Figure 1 shows, in which 3000 users incur an aggregate processor workload of approximately 46 percent.
The results you achieve in an actual production environment might vary, given the nature of real workloads versus simulations. In addition, times of peak demand (e.g., mornings, immediately after lunch, evenings) might result in higher observed and sustained CPU workloads. The indicated CPU load of 46 percent might seem to suggest a poor usage of CPU resources, but keep in mind two considerations. First, this value is an average that doesn't take into account the aforementioned peaks (for which you must allow sufficient headroom to provide adequate levels of service). Second, merely having the CPU capability to support more users doesn't imply that you should place more users on the system. (Keep in mind that your system has further CPU requirements, such as storage allocation and backup/restore capabilities.)
The I/O subsystem is the most interesting configuration area for the entire data-center messaging platform. Because you're dealing with huge data volumes and complex RAID variations, the storage subsystem configuration on back-end servers requires the sophistication of SAN-based storage. Let's review the back-end server storage configurations that I covered in "Scaling Out OWA." In this configuration, the OWA back-end servers consist of three storage groups (SGs), each containing four databases. Maximum storage allocation per SG is 180GB; thus, the maximum expected size of each database is 45GB. Database sizes of 45GB or less facilitate short-duration backups and restores, so you don't want to exceed this threshold.
The SAN takes care of all the back-end servers' disk volume configurations. (No local disks are used.) The SAN presents each server with seven disk volumes. Of these fundamental physical volumes, one is devoted to each of the three SGs and another is devoted to each SG's transaction log. The remaining SAN-served volume is used for the OS.
You should always keep SG volumes and their associated transaction logs physically separate. From the perspective of the servers, then, these seven volumes are independent; the SAN controller presents them as physically separate. Because all the volumes are created from one huge array of disks, the SAN controller uses virtualization technology to present logically separate volumes. (SAN-controller virtualization technology effectively uses hundreds of disk spindles to present one volume to a server. For more information about storage virtualization, see the sidebar "RAID Virtualization.") In my tests, I used the following disk volume configurations for back-end servers:
- C—system volume (OS and applications); RAID1; effective capacity 9GB
- E—transaction logs 1; RAID 1; effective capacity 18GB
- F—transaction logs 2; RAID 1; effective capacity 18GB
- G—transaction logs 3; RAID 1; effective capacity 18GB
- H—SG 1; RAID 0+1; effective capacity 180GB
- I—SG 2; RAID 0+1; effective capacity 180GB
- J—SG 3; RAID 0+1; effective capacity 180GB
The total SAN storage requirement for the data-center platform is 4824GB. The SAN, which consists of one hundred sixty 72.8GB 10,000rpm disks, provides this storage. (Total raw storage is 11,648GB.)
To ensure that no single points of failure occur, each back-end server is connected to the SAN through multiple and redundant connections. Figure 2 shows a simplified diagram of these connections for one back-end server.
Benefits of Using SAN Boot Technology
Using a SAN-served volume as a boot disk has a specific redundancy benefit: Any back-end servers that share the same hardware configuration can boot from the SAN volume and take on one another's identity. (Of course, only one server can boot from the SAN boot volume at any one time.) If a back-end server suffers a hardware failure (e.g., a system board failure), you can simply reconfigure the SAN volume presentation so that the boot volume originally presented to the failed back-end server is now presented to a spare back-end server (assuming you have a spare server with the same hardware configuration).
This solution provides a simple and cost-effective method for providing some degree of redundancy. For more information about Microsoft's support of SAN boot configurations, see the Microsoft article "Support for Booting from a Storage Area Network (SAN)" at http://support.microsoft.com/?kbid=305547. If you expect your system to page, use a locally attached disk for the pagefile. However, your system, like mine, might be unlikely to page if it has a large amount of memory.
SMTP Relay Server
In addition to using a combination of front-end and back-end servers, an Exchange-based data-center messaging platform should allocate dedicated systems as SMTP relays. In our environment, we used two SMTP relay servers, configured as dual-processor 2.8GHz Xeon servers with 3GB of memory and DAS. Each server processes inbound and outbound traffic, and a hardware switch load-balances SMTP connections to them. (Cisco Systems and Alteon load-balancing switches are good choices, or you could use Windows 2000 Network Load Balancing—NLB.) In our environment, the SMTP relays route SMTP traffic to another part of the network, in which the traffic undergoes further antivirus, content-filtering, and antispam processing.
Typically, SMTP relay servers use memory resources to maintain connections and context for message queues. Because SMTP connections are transient, SMTP relays generally don't maintain thousands of connections; thus, the memory requirement for maintaining connections is low. Memory requirements for maintaining context for message queues can be higher. An open message in the queue requires approximately 10KB of memory. A closed message in the queue requires approximately 4KB of memory. All consumed memory is part of the working set allocated to the inetinfo.exe process. Each open message consumes 5KB of kernel memory.
Additionally, Windows assigns a file handle to each open message. The HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SMTPSVC\Queuing\MsgHandleThreshold registry subkey controls the maximum number of file handles, which by default on Exchange 2000 Service Pack 2 (SP2) and later is 1000 plus an additional 1000 per 256MB block of physical memory. (Earlier versions set the subkey's default value at just 1000.) Thus, for our servers, which have 3GB of memory, MsgHandleThreshold's value is automatically set to 13000. A large MsgHandleThreshold value lets the SMTP server process messages in the queue at a fast rate. The value has an upper limit of 16000, but Microsoft recommends never explicitly setting the value greater than 15000. For more information about this value, see the Microsoft article "XGEN: Exchange 2000 Server SMTP Optimized with Maximum Handle Threshold Registry Key" (http://support.microsoft.com/?kbid=271084).
The default value is appropriate for the message throughput you expect to see on SMTP relay servers. If kernel memory usage becomes problematic and the SMTP relays are processing huge quantities of messages (leading to unresponsive or crashing servers), reduce handle consumption by modifying the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SMTPSVC\ Queuing\MsgHandleThreshold and HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\SMTPSVC\ Queuing\MsgHandleAsyncThreshold registry subkeys. Set both subkeys to the same desired hexadecimal value (of type REG_DWORD).
The HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Exchange\Mailmsg\ MaxMessageObjects registry subkey determines the maximum number of messages that an SMTP relay server can accommodate in its queues. This subkey's value (of type REG_DWORD) is 0x000186a0 (which, when represented in decimal form, is 100000; note that unless the value changes, the subkey won't appear in the registry). When the SMTP queues fill up with the maximum number of messages, the SMTP service refuses to accept further SMTP messages from any other SMTP relays (either internal or external) until you clear some messages from the queues.
By reducing the number of file handles that are cached in the kernel space, you can control consumption of the kernel memory. You probably won't need to reduce the default value of 800, but if you decide to do so, go to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ Inetinfo\Parameters\FileCacheMaxHandles registry subkey and modify the 0x00000800 value (of type REG_DWORD). For more information, see the Microsoft article "XGEN: Modifying Exchange 2000 Server File Handle Cache Parameters" at http://support.microsoft.com/?kbid=267551. If you change any of the aforementioned registry subkeys, be sure to stop and restart the SMTP virtual servers.
Simple calculations in our environment show that the expected maximum memory consumption—assuming a worst-case scenario of full queues—is approximately 543MB (65MB kernel memory, plus 130MB for open messages, plus 348MB for closed messages). If we deemed necessary after evaluating traffic volumes, we could increase the MaxMessageObjects value so that Exchange would rarely refuse an SMTP message submission. (Note that we're using two SMTP relay servers, so in theory we could hold as many as 200,000 messages in the queues—ample space for most temporary outages.)
Benchmark tests show that systems equipped with dual 2.8GHz processors will process more than 100 messages per second, with little context switching per second (in the region of less than 10,000 switches per second), at about 70 percent processor usage. Every message that the SMTP relay server processes is written to disk. The SMTP service's write buffer is 32KB. In general, the SMTP service performs seven disk writes for every queued message that's less than 32KB—in other words, for every queued message that can fit into the write buffer. An additional write is necessary for every additional 32KB of message extent. Therefore, we must optimize the I/O subsystem disk configuration for this I/O pattern. The disk configurations on the SMTP relay servers are as follows:
- C—Two 18GB 15,000rpm disks; OS and applications; RAID 1; effective capacity 18GB
- E—Six 18GB 15,000rpm disks; SMTP queues; RAID 0+1; effective capacity 54GB
- F—Four 18GB 15,000rpm disks; database volume; RAID 0+1; effective capacity 36GB
- G—Two 18GB 15,000rpm disks; transaction logs; RAID 1; effective capacity 18GB
By default, Exchange writes messages on the disk that holds the Exchange binaries. In this scenario, the default location is C:\program files\exchsrvr\mailroot. We reconfigured Exchange to point to the E drive, which is the most suitable location for high-performance and high-capacity storage. To accomplish this reconfiguration in your environment, use ADSI Edit to navigate to the Configuration Naming Context within Active Directory (AD). Locate the ConfigurationContainer|CN=1,CN=SMTP,CN=Protocols,CN=servername, CN= Servers, CN=AdminGroupname, CN=AdministrativeGroups, CN=orgname,CN=Microsoft Exchange,CN=Services,CN=Configuration path, where servername is the name of the hosting server, AdminGroupname is the name of the Admin Group in which the server resides, and orgname is the name of your Exchange organization. Select Properties, view Both, and modify the path for msExchSmtpBadMailDirectory, msExchSmtpPickupDirectory, and msExchSmtpQueueDirectory to point to the E drive instead of the C drive.
You must wait for these changes to replicate to all other Win2K domain controllers (DCs) before you restart the Microsoft Exchange System Attendant on each mail relay to copy the new settings into the Microsoft IIS metabase. Wait a few minutes, then reboot the SMTP relay server so that it starts up with the new configuration live.
Think About It
Building a data-center messaging platform for hundreds of thousands of users requires much thought, particularly when you want to map physical resources (e.g., storage) to virtual entities such as SGs and databases. The most challenging aspect is achieving an optimum storage configuration. Other concepts, such as memory and processors, are easier to handle. The advent of new storage technologies such as virtualization helps you achieve a beneficial configuration of storage volumes for Exchange servers. You simply specify what you want to present to each server, then leave all the "slicing and dicing" of data across disks to the storage controller.