Microsoft Exchange Server system architects usually base their infrastructure design on some rules of thumb. For example, you should have at least 128Kbps of network bandwidth available if you put two servers in the same site and at least 64Kbps of network bandwidth available before you use an Exchange site connector. However, sometimes these rules keep you from getting the most out of your system. When you understand how Exchange Server works and what goes on between the Message Transfer Agent (MTA) and the Directory Service (DS), you can stretch these rules.
Let's look at a company that broke the rules—or at least stretched them—to build a global messaging system. With offices spread over three continents, this company has achieved consistent worldwide message delivery times of less than 10 seconds.
NatWest GFM, a trading and investment bank in the UK-based NatWest Group, needed a reliable, scalable, and—above all—fast messaging environment. The existing messaging infrastructure had evolved haphazardly and offered global message delivery times of about 4 hours. The company's management wanted a new infrastructure that would deliver standard messages in less than 10 minutes.
The company has offices in Tokyo, Singapore, Hong Kong, London, New York, Madrid, Beijing, Taiwan, and Shanghai, and company personnel travel around the world. In the company's old system, WAN connection speeds between offices varied from 1Mbps between major offices to only 2Kbps committed information rate (CIR) between minor offices.
Advantages of One Site
After some preliminary research, NatWest GFM believed that it could put all its servers in one site. The company came to this conclusion for several reasons.
First, a single site is the fastest design possible because of its inherent fully meshed structure—every server communicates directly with every other server. When you have multiple sites, communication takes place between sites through one or more bridgeheads. This situation means you have one or more additional links in each chain from sender to recipient. Each link in the chain represents a point of potential failure.
Second, London- and New York-based administrators could easily administer the entire organization with a single-site arrangement. Some of the remote Asian locations could offer only limited support for Exchange Server systems. Although you can centrally administer a multiple-site environment if you have appropriate permissions in place, a single site makes this process slightly easier because administrators don't need to manually connect to each site to make changes. One site also promotes the idea of internal unification within the IS infrastructure. Administrators of different sites tend to make changes that might not be in place at other sites. A single site enforces common standards such as a common Internet email address format or Global Address List (GAL) display format.
Third, single sites accommodate mobile users well. The bank's personnel routinely work abroad for several weeks or months at a time. Because Exchange Server doesn't offer a standard capability for moving mailboxes between sites, you have few options to provide messaging facilities to company personnel on different continents. With a single site, you can move mailboxes from one server to another easily.
Fourth, servers work together better in one site than they do in multiple sites. In one site, public folders, custom forms, calendar information, custom directory attributes, and event scripts work better and spare you the difficulties of getting them to work across site boundaries.
The theory behind the one-site solution was sound, but the project team had to prove the new design would work before proceeding.
The Technical Assessment
After NatWest GFM decided on a one-site strategy, those of us in the design team needed to determine whether implementing a single site was possible. At design time in mid-1998, few other organizations had global sites. Key prerequisites of a one-site strategy include the following:
- A permanent and reliable connection must exist between all servers in the site.
- Physical network connectivity must exist between all Exchange servers.
- Every Exchange server must have name resolution to every other Exchange server in the site.
- Every client must have name resolution to every Exchange server in the site.
- Every Exchange server must use the same service account.
- To avoid remote procedure call (RPC) errors, the WAN must have sufficient capacity to let servers respond to RPC requests within 30 seconds.
Although NatWest GFM had a very good WAN, some of the links were inadequate. For example, the frame relay WAN link between Tokyo and Singapore was only 24Kbps CIR in one direction and 16Kbps in the other. Also, the WAN link between some of the Asian offices and New York was just 2Kbps, though studies had shown that the link consistently experienced bursts of up to 24Kbps. With frame relay networks, the network provider lets you use more bandwidth than you've paid for as long as no one else is using it, but that privilege can disappear at any time. You can't base a design on bandwidth rates that aren't guaranteed. The bank decided to upgrade to 48Kbps CIR in both directions for Tokyo and Singapore and force all Asia-New York traffic through London.
One reason for the upgrade was to ensure that the Exchange servers had sufficient capacity to prevent RPC errors. When an Exchange server issues an RPC request, it expects an acknowledgment within 30 seconds. If the network is too congested or a fault occurs, Exchange Server reports errors and communications might fail.
Another prerequisite was that all clients and servers had to be able to resolve all Exchange servers by name. Fortunately, the bank already had a good WINS installation. The Windows NT infrastructure team organized a hub-and-spoke-style replicated WINS infrastructure that enabled all clients to resolve all server names to IP addresses. As a backup measure, we put LMHOSTS files on all the Exchange servers so they could communicate in the event of a WINS failure.
The reason Microsoft Outlook requires access to all servers in the site is that it directly connects to any server that contains a required resource. For example, public folders, the free/busy information folder, and the Outlook Address Book (OAB) folder can reside on any server in the site.
When you install multiple Exchange servers in the same site, the Exchange services must be authenticated using the same NT domain and service account on each server. Each continent on which the bank operates uses a separate account domain and various resource domains. The company's three top-level account domains already had two-way trusts between them, enabling the servers to share one service account.
We knew we'd meet all the technical prerequisites. We decided that the single site would work for several reasons:
- Only a few servers would be at each location.
- We intended to use Exchange Server 5.5, which provides more control at an intrasite level than previous versions did.
- The meshed WAN infrastructure closely matched the fully connected mesh of the Exchange Server site.
- The project team had a great deal of experience with Exchange Server systems.
- The Move Server Wizard provided a backup plan.
Our next task was to ensure that Exchange Server wouldn't swamp the network. Microsoft recently released several white papers and technical articles describing how Exchange servers communicate. For example, the Microsoft TechNet article "MS Exchange Server 5.5 Advanced Backbone Design and Optimization" by Paul Bowden is an excellent guide to Exchange Server's network requirements. The article suggests that an Exchange Server site can span large distances as long as the number of servers at the end of the links is small. For example, the oft-quoted breakpoints for a site are either 64Kbps or 128Kbps—but you can't have 50 servers at the end of a 64Kbps link and expect that link to be adequate.
We knew that each of our minor locations, with up to 200 users, wouldn't have more than one server. Our two major locations, London and New York, had three or four servers serving up to 800 users each and had excellent WAN links with up to 1Mbps bandwidth. With just one server at the end of each small WAN link, we determined that a single site wasn't going to swamp the network. Furthermore, the project team sent administrators at each location a questionnaire asking about employee email habits. We learned that 50 percent of email remained within the local post office, 25 percent remained within the originating region (i.e., Asia, United States, and Europe), 20 percent went to company addresses outside the originating region, and 5 percent went to the Internet (via a London-based connection). This study proved to us that email-based WAN usage would remain at a minimum and that Internet mail traffic, which for Asia would continue to be via London, wouldn't present any problems.
Make sure you understand your users' use of the Internet when you design an infrastructure. Many financial institutions tend to be on the leading edge of technology and are increasing their use of the Internet. We had to ensure that our network would allow growth in Internet email usage, particularly with the improvements in Internet delivery efficiency and ease of use that Outlook 98 provides.
In global WANs, the time difference between locations can be beneficial. The WAN links between our Asian and London sites vary from 48Kbps to 256Kbps, which could cause bandwidth problems at peak periods. However, Hong Kong, Singapore, and Tokyo are a full working day ahead of London and one-and-a-half working days ahead of New York, so the network contention remains minimal.
Despite our WAN analyses, email questionnaires, solid global network, and gut feeling that this project would work, the bank needed a back-out plan. If the design didn't work as planned, we needed to be able to reorganize the system with minimal disruption to the business. Microsoft provided this ability in late 1998 with the release of the Move Server Wizard. (For more information about the Move Server Wizard, see Tony Redmond, "How to Rebuild Your Exchange Organization," January 1999.) With previous versions of Exchange Server, if the site design didn't work, our only choice was to completely tear down the infrastructure and start again from the beginning. This process meant data loss, service downtime, and more expense. However, with the Move Server Wizard, we could easily split the servers into multiple sites in the unlikely event that the design failed. Figure 1 shows the design we planned and implemented.
Implementation and Testing
The implementation is smooth if you've done the planning and testing stages well. The global-site implementation was a fairly straightforward process. One problem we encountered was that our directory had about 40,000 entries in it, which meant our directory database was about 250MB. Getting this data to our remote servers quickly was going to require some extra work.
When you install Exchange Server and instruct it to join a site, it downloads just enough information to join the site and start all its services. Full directory replication of user details and server details takes place later when the server is operational. Because the remote servers would have to download about 250MB of directory information upon installation, we decided to build the servers in London, back up the DS and Information Store (IS) to tape, and perform a full server recovery in each location. When we started the services in each location, the only thing we had to download across the WAN was the directory information that had changed between our installation times in London and in Asia.
In each location, we used Mailstorm, a utility from the Microsoft BackOffice Resource Kit (BORK), 2nd Edition, to test how the network coped with realistic email information. Mailstorm isn't a simulation utility; it sends real email. The email information we obtained from our questionnaires let us test the network using realistic traffic patterns. During the testing phase, we experienced no RPC timeouts and the Exchange Server machines reported no errors. We began sending numerous 9MB and smaller email messages. We then watched the queue lengths reach 50 messages, then drop away suddenly as Exchange Server opened another association. (When the number of messages reaches a set limit, Exchange Server opens another MTA association.) When we were satisfied with the infrastructure, we began to migrate the users.
Exchange Server is now robust and usable out of the box. However, to maximize its potential for our system, we made several changes to its default settings.
MTA and DS Registry changes. Clearly, a single global site pushes Exchange Server to its limits. We had to make several Registry changes to ensure that Exchange Server wouldn't swamp the WAN.
Depending on the options you configure in the Performance Optimizer, Exchange Server allocates a pool of between one and three MTA kernel threads per server. MTA kernel threads manage the communications channels, or associations, between MTAs. In a large site or a site with slow network links, the MTA can become backlogged if all available kernel threads are in use. For example, in this site, if a user on a London server simultaneously emails a 2MB attachment to users in Tokyo, Singapore, and Hong Kong, the activity might use up the pool of available kernel threads on that London server. When no threads are free, Exchange Server can't process messages in the MTA—even to or from another London-based server on the 100Mbps LAN. Following advice from Microsoft, we carefully changed the Registry setting in HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Services\ MSExchangeMTA\Parameters according to the values that Table 1 shows. The default value of three kernel threads is normal. Under most circumstances, you don't need to change this value.
The DS also required some Registry modification. By default, any change to a directory object will cause Exchange Server to notify all servers in the site of the change after 300 seconds. Exchange Server will wait 30 seconds between notifying each server of the change. To prevent Exchange Server from overloading the network and the DS of the server sending out the updates, we changed the Registry settings in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\ Services\MSExchangeDS\Parameters, as Table 2 shows.
If certain Exchange Server components (e.g., the MTA or the DS) haven't communicated with another similar component within 15 minutes, they must use NT authentication to reauthenticate. The Registry changes we made forced Exchange Server to bundle directory updates, thus reducing network traffic associated with the authentication process, which could otherwise occur every five minutes. The changes also reduced the chance that the sending of an important email message would coincide with directory updates on the network.
MTA Association Threshold. The Messaging Defaults tab in the MTA Site Configuration Properties screen of the Exchange Administrator program includes several configurable parameters for fine-tuning the MTA, as Screen 1 shows. An important parameter for sites spanning large WANs is the Threshold (msgs) field in the Association parameters section. This field shows the number of messages that must be in the queue before the MTA opens another association to the remote server, thus allowing the backlog of messages to clear more quickly. The Threshold (msgs) field defaults to 50. Lower the number in this field if large email messages often hold up smaller messages waiting in the queue.
Changing this number involves trade-offs, so make sure you thoroughly understand your current and expected mail traffic patterns and the configuration of your site before changing this parameter. If you set this value too low, Exchange Server will open multiple associations with the same server and choke communications with the other servers and connectors in the site. If you set the value too high, the MTA will wait longer for the queue to build before opening another association, which might also delay communications. We left our setting at 50. NatWest GFA's MTA queues haven't built up excessively, but the company watches them closely.
Server naming. Every 3 hours, Exchange Server runs an intrasite Knowledge Consistency Check (KCC). This process ensures that all Exchange Server machines in the site have the same directory information. The 3-hour interval is hard-coded into Exchange Server, so you can't change it via Exchange Administrator or the Registry.
Although you can't change the amount of data Exchange Server transmits during this process, or the frequency with which it occurs, you can optimize the procedure if you understand how it works. The KCC process moves through the servers alphabetically according to their names. The first Exchange Server machine on the list communicates with the second server. The two servers compare their knowledge. If the servers' data is inconsistent in any way, the directory services initiate replication. The process then moves to the second server in the list, which communicates with the third server. This process continues until Exchange Server has checked each server for consistency and the last server has compared itself with the first one.
In a large WAN environment, name the servers so that the KCC communications proceed by the optimal route. Determine your least-utilized links between servers, then name the servers so KCC will use these links.
Public folders. One of our biggest concerns was that users would swamp the WAN through heavy use of public folders. When a user selects a public folder, Exchange Server searches for that folder according to an algorithm-determined search order. (For more information about this algorithm, see the Microsoft article "XADM: How to Determine Which Public Folder Replica is Used" at http://support.microsoft.com/ support/kb/articles/q154/9/41.asp.) First, Exchange Server searches the Public Folder server defined in the user's home server's private IS. If the Public Folder isn't on this server, Exchange Server tries any other server in the site that contains a replica. If such a server is unavailable, Exchange Server tries the others. In short, if a user wants to view the contents of a public folder that is on the site, but not on the local Exchange Server computer, you can't prevent that user from connecting to any other server in the site.
The only way around this problem is to implement an efficient folder hierarchy and strictly enforce permissions on the folder so that only local users have read access. For example, under the top-level folder (each company in the organization has one top-level folder) is a subfolder for each business division (e.g., Credit, Finance, Technology). Under these folders are the four major locations: Europe, America, Asia, and Global. Exchange Server replicates any subfolders under Global to all servers in the site for rapid local access, but it doesn't replicate other folders. This system has been effective for NatWest GFM. Chapter 6 of the Exchange Server 5.5 Resource Guide on any recent TechNet CD-ROM is an excellent source of information about public folder location and storage.
Free/busy folder replication. By default, Outlook 98 updates the free/busy folder every 15 minutes and each time a user shuts down a client machine. Outlook holds the free/busy information in a system public folder. By default, only one free/busy folder is in each site. To prevent clients from overloading the WAN with free/busy updates, we replicated this folder to every server in the site.
Limits. Always implement limits on your disk quota, connectors, and MTA, even if you feel that your company has plenty of disk space or network bandwidth. Without limits, you'll always be susceptible to Denial of Service (DoS) incidents. For example, without disk quota limits, you will be susceptible to a server shutdown attack. With a little skill, an employee could fire a large email message throughout the company enough times to completely fill the server's disk space and cause the Exchange services to shut down. If you don't have limits on your connectors or on your MTA, someone outside the company could initiate this type of attack. Make sure you implement limits—even if they're measured in hundreds of megabytes.
In Exchange Server, you can enforce message size limits in three different places: the MTA, the connectors, and individual users' property pages. NatWest GFM has a 10MB limit on all MTAs and a 10MB limit to and from the Internet. If users need to transmit larger files, they can copy a file to a shared directory and email the directory's link to the intended recipient. The recipient can then copy the file when network utilization is low. Although users might legitimately need to send very large messages to other users on the same server, we had to establish a limit to prevent a DoS attack. Therefore, on users' property pages, we set an outgoing and incoming limit of 500MB.
We defined users as normal, power, and super power users, as Table 3 shows. The Prohibit Send and Receive value might seem high, but the bank's management was adamant that users not be prevented from receiving important email. The project team advised that the risk of a DoS attack was slim, but an attack would affect everybody. The compromise was a high Prohibit Send and Receive value of 1GB.
Monitoring. To make any large deployment work, you must be able to monitor your Exchange Server organization. You must always know how many mail messages are in your queues, how heavily utilized your servers are, and how much mail flows through the entire system. You must implement a monitoring solution that lets you know a problem exists before users publicly identify the problem.
On a workstation, the bank implemented a sophisticated monitoring suite using NT's Performance Monitor. This workstation provides realtime graphical feedback while it monitors the system's various components. These components, in turn, provide an overview of the system's health.
For example, we monitor disk space levels on all servers, queue lengths on all MTAs and connectors, and processor and disk activity on all servers. The London-based workstation monitors in detail all the local servers but monitors only key remote components. Performance Monitor is an excellent tool to improve service-availability times.
When we began this project, we knew we wanted to push the limits. NatWest GFM enjoys pioneering boundary-pushing efforts. Combining that spirit with the latest developments from Microsoft, we came up with something useful.
Executives in this financial institution can now fly to any of their offices on three continents and find their mailboxes waiting for them. Internally, the company touts global message delivery within 15 minutes, but standard messages rarely take more than 10 seconds.
Today, lost mail and delays in delivering intraorganization mail are inexcusable. Exchange Server has become a mature product that gives companies the functionality and reliability they've always dreamed about. Now systems architects need to begin using this technology properly.