How do you plan for an implementation of Microsoft Exchange Server? What steps can you take to ensure success, or at least avoid some pitfalls along the way? I can easily discuss this topic at length, certainly beyond what can fit in one magazine article. Instead, I'll just focus on the most important aspects of deployment planning for Microsoft Exchange Server, reflecting on my experience of the past 18 months working with the product in a variety of corporate situations. Some ideas I'll discuss here won't be new to anyone who has installed Exchange, but some ideas might surprise you. You'll know what makes sense for your environment. My comments are generic whereas your knowledge isn't, so your own experience will always win out in the end. The four most important aspects of any Exchange deployment are Windows NT infrastructure, network connections and bandwidth, the shape of the Exchange organization, and server connections.
Organizing Exchange in NT
Computers run Exchange within a somewhat inflexible hierarchical arrangement
known as an organization. An organization is subdivided into sites,
which are closely connected groups of server computers that communicate
continually. Screen 1, shows an Exchange organization with 10 sites. An Exchange
organization is mapped on the NT infrastructure and the available network. The
easiest Exchange implementation follows two simple principles:
1. All servers operate inside the same NT domain. In large implementations, this principle usually means that you create a separate resource domain for Exchange. You do not create user accounts in the resource domain, only an account for Exchange administration (the service account).
2. You install and operate Exchange on dedicated servers. In particular, no potentially contending database-type application such as Systems Management Server (SMS) or SQL Server runs on the same computer as Exchange. Ensure that the server is neither a Primary Domain Controller (PDC) nor (less important) a Backup Domain Controller (BDC). To refine this principle, you can configure some Exchange servers to handle messaging and some to run connectors.
The real skill in Exchange design comes in knowing when to compromise one principle to arrive at a pragmatic design that meets your needs and is flexible enough to permit evolution. For example, you can connect Exchange servers across multiple NT domains. Many people do so because their NT domain structure wasn't well planned or they decide to separate users and computers into different domains.
Exchange's basic method of communication is to send messages between servers, and as long as a messaging connection is possible (for example, sending Simple Mail Tranfer Protocol-- SMTP--messages between servers), messages will flow. Messages include directory replication and public folder hierarchy and content replication, and interpersonal notes that users send each other.
Having one unified security context is best (the result of placing all Exchange servers into the same domain) because the finer points of Exchange can operate without hindrance. These points include single-seat server administration, public folder affinity, and message tracking.
Public folder replication gets a lot of attention, largely because of the inevitable comparisons people make with the replication mechanism of Lotus Notes. However, public folder affinity, which is the ability to direct all access to public folder contents to one or more predefined (and costed) points within the Exchange organization, is valuable because it reduces the amount of duplicated data floating around the network. Public folder affinity also allows more control over document content that you don't want replicated.
Affinity depends on clients having access to the servers where the content is stored, which can be outside the client's NT domain. This approach requires a trust relationship. Or better still, if all of the Exchange servers share a unified security context, affinity can proceed on automatic pilot because the client's security credentials are acceptable to all servers within the organization. Of course, you can establish a unified security context through two-way trust relationships between NT domains, but this approach is hardly elegant and not viable when more than three or four domains are involved.
Operating Exchange on dedicated computers is the luxury approach and valuable when things go wrong. Take email, which is now a mission-critical application for many companies. When email servers go down, users demand immediate reinstatement and give little credit to MIS departments that can't fulfill that demand.
When things go wrong, you want a simple checklist of what to do to restore service quickly, instead of having to mess around to rebuild a complex server. I know instances where getting a server back online after hardware failures took two or more days. Such delay is unacceptable when the CEO is waiting for email. You can operate dedicated servers and make sure you protect those servers with UPS and RAID devices.
Accidents happen, computers fail, and software has bugs. These three truths of computing mean that you need to be sure that you can get the company email system online quickly after catastrophic hardware failures, minor hardware failures, and botched software upgrades or other accidents of systems administration life.
The Fight for Bandwidth
No one has enough network bandwidth. Everywhere we turn, applications are
absorbing network capacity, and Exchange is no exception. To plan for Exchange,
keep a few factors in mind.
Exchange transports more messages than you might expect, including interpersonal mail, directory updates, configuration updates, and public folder content and hierarchy. Don't expect your experience with another messaging system to accurately reflect just how much data Exchange will move around. For example, Microsoft Mail or Lotus cc:Mail concentrate on interpersonal mail only, so statistics you extract for these systems won't tell you how many messages will travel between Exchange servers.
The default replication schedule often generates too much replication activity. To take control of replication activity, minimize the number of replicas of public folders that you maintain across an organization and define more appropriate replication schedules for the directory and public folders. For instance, if your directory entries don't change often, you don't need a replication schedule with a 15-minute interval between updates. Define a 2- or 3-hour schedule instead. But during migration periods when directory updates occur very frequently as new mailboxes are added, you'll need more regular updates to let people see the new users in the directory.
In addition to transporting messages, Exchange servers use remote procedure calls (RPCs) to talk together inside sites. However, you can't determine just how much bandwidth servers will absorb as they chat. For example, any change made to a server's configuration or a change to the site configuration from an individual server (or workstation running the administration program) will be dispatched via specially-coded messages to all the servers within the site. This mechanism ensures that all servers maintain a complete picture of the site configuration. Exchange servers use the same type of mechanism, albeit at a more leisurely pace, to exchange configuration data between the different sites in an organization, so that each server knows about the other sites and knows the configuration details of those sites.
Factors such as the percentage of messages that travel off a server, the frequency of directory updates, the number of servers within a site, the quality of the network links that connect the servers, the use of public folders, and the behavior of individual users affect network use. You might be surprised at how much Exchange servers communicate with their counterparts within a site, similar to the way that NT domain controllers synchronize each other with updates of the Security Accounts Manager (SAM) every 5 minutes (by default). For example, within a site, all Exchange servers automatically synchronize directory changes with each other. Thus, a directory entry made on one server will be replicated to all other servers within the same site after 5 minutes, or shortly afterwards if the network or servers are heavily loaded.
The effect that user behavior can have on network load is an interesting topic, especially if you're moving from a green-screen email system. The average message size continues to grow. Yesterday's simple 2KB message is today's 10KB message and tomorrow's 40KB message. People use the facilities available to them, and the Exchange and Outlook clients encourage users to embellish their email with fonts and colors and to attach any file that they care to send to their friends. I have known users to attach files larger than 20MB to messages and expect the server to faithfully deliver the message to a large distribution list. The auto-signature option lets you automatically append cute sign-off text to each message, and you can even append graphics. Many people insist on including company logos in their auto-signatures, driving up the average size of messages to more than 100KB. Clearly, user training can positively influence such behavior, but self-tutoring Windows applications eliminate many formal training opportunities for enforcing good habits and eliminating bad habits.
Given that a network might not be able to handle the load that new technology and bad user habits impose, what type of network links do you need to put in place? The classic answer is that servers within a site operate on the expectation that a permanent, LAN-quality link is in place. If you can't compare your connections to the bandwidth delivered by a LAN (or high-quality WAN), don't bother connecting servers into a widely distributed site. Make sure that every server in a site has access to at least 64Kbps of good-quality bandwidth to let them communicate with their peers. If you don't provide adequate bandwidth, servers can't transmit RPCs, messages won't get through, and message queues will build up rapidly.
Sizing Sites
What size should a site be and how many sites should you plan for in an
organization? The answer depends on the quality of your network links: If, like
Microsoft and Digital, you have a network based on T1 and T3 links rather than
64Kbps links, you can consider building a very large North American or European
site. But when bandwidth becomes scarce, you must consider other options.
At the start, try to create as few sites as possible. However, two factors will influence this approach: First, no administration tools for cross-site operations are available today; second, cross-site operations (e.g., moving a user), are manual and time-consuming, so many designs combine servers into very large sites to minimize cross-site operations. The largest Exchange implementations today (Microsoft and Digital) both operate very large sites in North America.
In countries such as those in the former Soviet Union, you can't connect servers in one site because you can't get the necessary network links, even if you can afford to pay for them. The same restriction applies in some locations in South America and the Asia/Pacific region. In all these cases, you must create several sites, perhaps one for each location. All the global deployments I know of have large sites in North America, smaller (but still large) sites in Europe, and the smallest sites in the Asia/Pacific region. Exceptions occur where local conditions permit availability of cheap bandwidth.
Within a site, you can run from 1 to more than 100 servers. The issues involved in running more than 10 servers are chiefly operational, such as keeping track of what all the servers are doing. For example, Microsoft has a site with more than 160 servers. I don't expect you to have the same backup resources (the entire Exchange development team), so restrain your enthusiasm and limit yourself to smaller sites.
Connecting Exchange
You connect sites with connectors, predefined links that tell
Exchange how messages flow from one site to another. You have four options: the
direct, RPC-based site connector (usually called the site connector, a
term that often confuses people new to Exchange because you can use all the
connectors to link sites); the X.400 connector; the Internet or SMTP connector;
and the Dynamic Remote Access Service (RAS) connector. Screen 2 shows connectors
linking Exchange sites.
If you don't have very reliable, fast network connections between sites, the RPC connector is not a viable option. The connector uses RPCs between servers in the different sites to exchange messages. If the network is incapable of carrying the RPCs to the target servers, large message queues will build up. Many people start with site connectors because they have a reliable network link in place but find that the link proves troublesome under the strain of a production workload. In this case, the results of a pilot project might not be valid.
Many consultants, including Microsoft, recommend a minimum of 56Kbps or 64Kbps available bandwidth for a site connector. A recommendation to use a particular bandwidth is somewhat arbitrary because this number is a starting point only. You must increase or throttle back to reflect the load in your environment. Some companies find that they need 128Kbps or 256Kbps links for site connectors to perform reliably; some anecdotal evidence suggests that a site connector can run across a 9.6Kbps link. Of course, 9.6Kbps and 256Kbps links represent a radical difference in capabilities, and the former is viable only if a very small number of messages pass across the link each day. Sites that experience heavy network traffic, act as a central site for Internet or X.400 connectors, or serve as bridgehead sites for directory synchronization with other (external) directories all need large network pipes if they don't want large message queues to build up.
Because of their direct server-to-server RPC-driven links, site connectors are the easiest type of connector to configure, and they let several servers in each site be points of contact. However, over a site connector, you cannot control the network traffic that passes between servers, and no tools exist to analyze what passes over the link when you're in production. The X.400 connector comes into its own when you're concerned about the capability of the network or you want to schedule connections.
Some people in the US don't seem to like the X.400 and X.500 standards. Europe is different, probably because Europeans have had to deal with international boundaries, multinational character sets, and other blocks to connectivity. Much of the internal working of Exchange stems from the concepts expressed in the X.400 and X.500 recommendations, and you can use these technologies for a major deployment of Exchange without anyone outside the implementation team detecting that X.400 plays an important part in the Exchange architecture. For example, the Exchange Mail Transfer Agent (MTA) is based completely on the X.400 recommendations. The X.400 connector is the connector of choice in low-capability networks. We see a lot of low-bandwidth connections in Europe and the Asia/Pacific region, and X.400 connectors are popular in deployments there. Screen 3 shows an X.400 connector, which is robust over extended links such as Dublin to Kuala Lumpur.
Some people argue that the Internet connector offers the same type of functionality as the X.400 connector and is easier to set up. The Internet connector has fewer property pages to complete when you create a new connector and is equally capable of connecting sites. However, the Internet connector's SMTP roots prevent it from offering scheduled connections. Messages sent over both the X.400 and Internet connectors must be converted from Exchange internal format to either P2/P22 (X.400) or SMTP/ MIME (Internet) before they are dispatched, meaning that both connectors are slower than the site connector. The overhead of format translation has been measured at 20 percent to 25 percent, but your mileage will vary. In any case, if you don't have the network to support site connectors and you are unwilling to pay for an upgrade, you'll pay somewhere else--in this case, by accepting the overhead of format translation.
Dynamic RAS is the last port of call. By definition it is slow, and the speed of the modem connection at each end limits throughput. However, when you can't do anything else (e.g., you're waiting for a permanent network connection but want to get Exchange into production), you have no choice. Bear the following points in mind if you use Dynamic RAS:
- If possible, install Exchange on the first server for the site where you have a permanent network connection. Allow directory replication and backfill--backfill describes how the directory is populated with data about users, servers, and the Exchange organization--to occur before detaching the server from the network and transporting the computer to its final destination. This approach avoids very large queues of messages (mostly containing directory entries) building up across the slow modem link when the server joins the organization.
- Do not use public folder replication unless necessary. If you use public folder replication, make sure that the replication schedule is throttled back as far as possible (once or twice a day). Try to keep the available bandwidth for personal messages.
- Encourage users to behave responsibly and not send messages with large attachments to users in the site served by the Dynamic RAS connector. One large message can occupy a 28.8Kbps modem for a long time.
- Monitor MTA queues for the site carefully because large queues can quickly build up if the modem link drops unexpectedly.
Specialized Sites
Sometimes you need to dedicate a specialized site to a particular purpose or
group of people, and you want to create a separate management environment. For
example, suppose you want to operate a separate site for your company's
executive staff and limit the number of people with administrative privileges
over that site.
Another example of a specialized site is the connector site, a site dedicated to message exchange with other systems such as Microsoft Mail, Lotus cc:Mail, Fax, Internet, and X.400. A connector site has at least two servers (to provide some resilience). Ideally, each server in the connector site needs to be able to handle the total messaging load so that if a server is taken offline, normal service can continue. You can configure some connectors, such as those handling SMTP mail, to be incoming, outgoing, or both; so inside the site, you can configure one server to handle incoming mail from the Internet and the other to handle outgoing messages. The logic behind the connector site is simple: The connector site removes the relative complexity of connectors from the standard messaging servers. Administrators can more easily make changes, such as applying a Service Pack (SP) for either NT or Exchange to a server in the connector site, because they don't have to interrupt service to users. In addition, you can allocate systems management to people who really know Exchange and thus avoid the chance that someone who knows only the basics could change a connector and affect the whole organization. Of course, not everyone can afford separate servers just to run connectors, but if you're operating at the high end of the messaging scale, this idea deserves your consideration.
Operating Multiple Exchange Organizations
You don't have to create a single Exchange organization. In fact, many large
enterprises find agreeing to create a single organization difficult: Business
units or divisions opt to exercise a degree of autonomy and run Exchange with no
regard for what other units do. In theory, this situation is an unmitigated
disaster, but in practice, it's not so bad. When two or more organizations are
involved, you cannot implement some Exchange features (such as directory and
public folder replication), but the basic messaging functionality works just
fine across any number of Exchange organizations. Sure, you won't be able to use
the site connector, but the X.400 or Internet connectors do a more than adequate
job of linking servers into what appears to users as a seamless messaging
environment.
As Exchange and NT evolve, I believe Microsoft will address two issues that multiple organizations pose: automated methods to share directory and public folder information across organizations, and tools to merge, split, and join organizational hierarchies to form new organizations. Exchange and NT will eventually share the same X.500-based directory (the Active Directory in NT 5.0), and at that point, we might be able to join, graft, and split entities (such as domains or organizations) from the directory. After all, corporations don't retain the same business shape all the time, so why should their messaging system assume that they will?
Evolving Exchange
Exchange hasn't been out long. I don't think we have yet found all the
tricks and techniques that we can apply to extract the utmost performance from
Exchange, but we are moving along that path quickly.