Over the years, Microsoft has endeavored to expand the scalability, availability, and reliability of its server solutions. Clustering is a proven avenue to this objective, and Microsoft has embraced the notion of clustering, working to make it an integral part of Microsoft OSs and product offerings. With the delivery of Windows 2000, Microsoft's clustering solutions have matured significantly.
Scalability, Availability, and Reliability
A cluster is a group of independent computers that work together to run a common set of applications and to provide the image of a single system to the client and the application. The goal of clustering is to boost scalability, availability, and reliability across multiple tiers of a network.
Scalability is a computer's ability to handle increasing loads while maintaining acceptable performance. Hardware scalability (scaling up, in Microsoft parlance) relies on one large extensible machine to perform work. Software scalability (scaling out) depends on a cluster of multiple moderately performing machines working in tandem, not unlike a set of RAID drives. In fact, I've heard Microsoft representatives use the term redundant array of computers (RAC) to refer to its scale-out clusters. Just as you add disks to a RAID array to improve performance, you can add nodes to a scale-out cluster to improve performance.
Availability and reliability are closely related to each other but differ slightly. Availability is the quality of being present, ready for use, at hand, accessible. Reliability refers to dependability. Even the most reliable machine fails eventually. Hardware manufacturers prepare for failure by providing redundancies in key areas such as disk drives, power supplies, network controllers, and cooling fans. However, redundancy on one machine doesn't insulate users from application failure. If the database software on one server fails, that server might be reliable, but that software and server combination isn't available. Thus, a single machine can't meet all the necessary scalability, availability, and reliability challenges that a cluster can.
Here again, a cluster can mimic a RAID array in providing availability and reliability. In a fault-tolerant disk configuration such as RAID 1 or RAID 5, all the disks work together in a redundant array. If one disk fails, you unplug it and insert a new one; the rest of the array keeps running—with no configuration, no setup, and, most important, no downtime. The RAID system automatically rebuilds the new drive so that it will work with the others. Similarly, when a computer in a cluster fails, you can simply replace it with a new system and keep running. Some clustering software can automatically configure the server and integrate it into the cluster—all while the cluster stays available.
Four Clustering Solutions
Microsoft offers four basic clustering technologies: Microsoft Cluster Services (MSCS), Network Load Balancing (NLB), Component Load Balancing (CLB), and Application Center 2000. These services are delivered in three solutions: MSCS, NLB, and Application Center. CLB is part of and is available only with Application Center. You can use NLB with Application Center or as a standalone solution. Win2K Advanced Server and Win2K Datacenter Server include MSCS and NLB, but you must purchase Application Center separately.
Table 1 summarizes which of the four clustering technologies are available in the various members of the Win2K Server and Windows NT Server 4.0 family. As you might imagine, none of these technologies are applicable to Win2K Professional or NT Workstation 4.0. Table 2 lists some of the cluster technologies' characteristics. You can refer to this table as I compare and contrast the technologies below.
Microsoft Cluster Services
MSCS, first known by the code name Wolfpack, then by the name Microsoft Cluster Server, and now by Microsoft Cluster Services, was Microsoft's first foray into the world of clustering on NT and is arguably Microsoft's best-known clustering solution. In an MSCS cluster, the MSCS software connects up to four physical computers running on a high-speed network. Typically, the clustered computers share a common storage subsystem and function in an "active-active" fashion, meaning that all cluster computers (nodes) are actively doing work to share the load but can also take up the slack if one of the nodes fails. Figure 1 shows a 4-node MSCS cluster.
MSCS exists primarily to increase application availability through its failover capabilities. Failover is a cluster's ability to move processing from a failed application (because of causes ranging from failed hardware to software bugs) at one node to another healthy node in the cluster. When the failed application is restored, a cluster should be able to fail back to the original cluster node. MSCS manages the failover and failback of applications running on a cluster without losing any data associated with a failed application, and MSCS maintains the user and application state across a failover. This type of clustering is stateful clustering. In contrast, NLB, CLB, and Application Center provide stateless clustering and dynamic load balancing (which I discuss in more detail later), in addition to promoting availability.
MSCS is a good choice for running crucial applications such as email servers or database applications. Let's say you decide to run Microsoft Exchange 2000 Server on a 4-node MSCS cluster. After you install the MSCS software and cluster-aware version of Exchange 2000, you can configure the cluster so that Exchange 2000 will fail over to a backup node if a problem occurs on the primary node. Users will undoubtedly have sessions open on the main server when it fails, but MSCS performs the failover quickly and automatically, without losing any data. The backup node picks up the workload and the data from the failed node, and service to users continues.
MSCS also lets users keep working while you upgrade an application. Instead of having to take an application down when you upgrade, you can perform a rolling upgrade (i.e., upgrading an application on one cluster node at a time, while the application continues to be available on the other nodes). For example, say you have a 2-node cluster. Node 1 runs Exchange 2000 and node 2 runs Microsoft SQL Server, and you've configured your cluster so that Exchange 2000 and SQL Server will fail over to the other node when necessary. When the time comes to upgrade SQL Server, you can use MSCS Cluster Administrator to initiate a failover of SQL Server on node 2. When node 1 takes over the task of running SQL Server (along with Exchange 2000), you can upgrade the SQL Server software on node 2. When you've finished, you can fail back SQL Server from node 1 to node 2 and repeat the process with node 1's SQL Server software. When you're finished, you've updated the SQL Server software without causing any downtime for users.
You don't typically use MSCS to scale an application for more users, as you do with the other three Microsoft clustering solutions. An MSCS cluster neither provides dynamic load balancing nor distributes applications across its nodes in a stateless, shared-nothing fashion as do NLB, CLB, and Application Center. In fact, the only real way to achieve application scalability with MSCS is to manually divide an application among the cluster resources during installation. For example, if you need to serve 5000 users on Exchange 2000, you can use a 2-node active-active cluster with 2500 users on each node. That way, you get the benefit of two servers handling the users plus availability in the event of failure. However, when a failover occurs, the remaining node must be able to handle all 5000 users until you can restore the failed node.
Network Load Balancing
NLB, formerly known as Windows NT Load Balancing Service (WLBS), distributes the incoming load of IP requests across multiple nodes running the NLB software. NLB provides scalability and availability for an IP-based application, such as a Web server. As user demand grows for server resources, NLB lets you add servers to handle the load. For example, Exchange 2000 benefits from using NLB with its Microsoft IIS-based communications front end for Outlook Web Access (OWA) to offload work from the main Exchange 2000 servers. The NLB cluster routes client requests to the back-end server or servers. If one NLB node goes down, the others pick up the extra load and the user notices no interruption in service.
The underlying NLB software is a network device interface specification (NDIS) driver that sits between the NIC and TCP/IP. You install the driver on each server in an NLB cluster. All NLB nodes share a virtual IP address that represents the desired network resource (e.g., the Web server). All NLB servers listen to all user requests, but only one responds. A load-balancing scheme based on a fast hashing algorithm that incorporates the client IP address, its port number, or both determines which server responds. You can specify an affinity to allow varying amounts of traffic between servers (i.e., you can specify that some servers should get more traffic than others). A heartbeat feature lets all the NLB nodes know about any changes in the cluster, such as a failure or the addition of a node. When changes occur, NLB starts a convergence process that automatically reconciles the changes in the cluster and transparently redistributes the incoming load.
NLB has its genesis in Microsoft's 1998 acquisition of Oregon-based Valence Research. Valence's Convoy Cluster Software became WLBS, which was an add-on product for NT Server 4.0 and NT Server 4.0, Enterprise Edition. In Win2K, Microsoft renamed and enhanced WLBS, but the core technology is still the same. NLB is an integral part of the network services in Win2K AS and Datacenter.
MSCS and NLB in Win2K work well together as long as you run MSCS and NLB on separate computers—for example, in the configuration that Figure 2 shows. Microsoft doesn't recommend running MSCS and NLB on the same computer and doesn't support doing so because of potential hardware sharing conflicts between MSCS and NLB. For information about uninstalling MSCS or NLB, see the Microsoft article "Windows 2000 Interoperability Between MSCS and NLB" (http://support.microsoft.com/ support/kb/articles/q235/3/05.asp).
Component Load Balancing
CLB is something completely new for Win2K. COM+, the next step in the evolution of COM, is also new for Win2K. COM+ integrates COM, Microsoft Transaction Services (MTS), and system services with the goal of making Win2K a better platform on which to design, develop, deploy, and maintain component-based applications. Put simply, COM+ is COM with a bunch of system services, including services that let you distribute components across multiple systems. One COM+ service is the ability to load-balance access to COM+ objects. CLB is simply the load-balancing cluster—multiple servers that share the load of activating and executing COM+ objects.
The need for CLB, like the need for NLB, stems from availability and scalability requirements. When you run a critical application that consists of COM+ objects, a failure in the application or server causes serious problems. CLB ensures that the application will continue to run if a failure occurs and that the user won't experience a lapse in service. Furthermore, some COM+ objects can be large and fairly complex, and running them on a server along with other key applications such as IIS could bog down system performance. To provide scalability in this case, you could move the COM+ objects off the IIS servers and distribute them among multiple servers in their own CLB cluster.
Suppose you're a computer manufacturer with a commercial Web site where people come for product and technical information, product support, purchasing, and more. Users around the world work with your products 24 hours a day, so your Web site must be available and performing well all the time. You can take the approach that Figure 2 shows and run NLB on your Web servers with access to the back-end MSCS database cluster. However, let's say that much of the logic behind the services you provide is coded in COM+ objects. You could run those objects on the Web servers, but Web server response time might slow because the machine running the Web server also must process the COM+ objects. You probably need CLB.
Figure 3, page 62, depicts how you might deploy a CLB cluster in a highly available and scalable Web site. CLB balances the load of accessing the business logic, which COM+ objects in the application's middle tier provide. (A CLB cluster implicitly requires Application Center, which I explain in more detail in the "Application Center" section, but now you know why you would use CLB.)
CLB uses a combination of server response time and a round-robin algorithm to determine which server will handle the next request. CLB polls the COM+ servers in the cluster at regular preset subsecond intervals to determine how quickly the servers respond to the poll (their response time is directly linked to how busy they are). CLB then lists the servers in order by response time, with the fastest server at the top so that it will get the next COM+ activation request. Then, CLB distributes the work to the servers in the order they appear on the list until the next polling interval, when CLB reorders the activation list by server response time.
Because all this processing takes place over the network in realtime, you can see that network contention can be a problem if you add CLB to a slow or congested network. You should deploy CLB clusters on a fast network backbone of at least 100Mbps. You don't typically put a CLB cluster on the regular corporate network where all the other network traffic lives.
Distributing COM+ objects in a CLB cluster doesn't make sense in all situations; you must base the decision to use CLB on an analysis of your application requirements. Clustering adds the overhead associated with the client requests that traverse the network. Clustering also adds the overhead of selecting a server and activating the COM+ object to satisfy the client request. In some cases in which applications use a small number of lightweight COM+ objects, simply instantiating the objects locally on the Web server might provide better performance. You should consider using CLB in the following three scenarios:
- The COM+ objects that your business logic comprises are relatively "heavy" and must always run on the fastest server.
- Security is a major concern, and you want to isolate COM+ objects by placing them behind an additional firewall.
- Your COM+ applications are partitioned into multiple tiers for development or design reasons, and you need to employ CLB to separate the tiers.
CLB isn't available in any of the Win2K Server family of products, nor can you purchase it as a standalone product. Originally, Microsoft intended to include CLB in the Win2K Server family, but in September 1999, the company pulled it from Win2K Release Candidate 2 (RC2) to put it into the newly announced Application Center. Today, the only way to get CLB is with Application Center.
Application Center is part of Microsoft's .NET Enterprise Server family, whose precursor was Windows Distributed interNet Applications (Windows DNA) servers. Application Center's purpose is to be a single management point for your Web farm (i.e., multiple physical Web servers working together to serve common Web content), providing a unified user interface (UI) and leveraging both NLB and CLB for load balancing. Using Application Center, you can create clusters, join existing clusters, add and remove cluster members, deploy new content, configure load balancing, and monitor cluster performance. The result is a Web farm that looks like a single Web server to the outside user and is scalable, easy to manage, and highly reliable. These capabilities are important as an increasing number of critical applications become Web-based.
To see the full topology of Application Center, with all of Microsoft's clustering technologies working together, look again at Figure 3. The NLB cluster could be a cluster of IIS servers, for example, and the CLB cluster could provide the business logic. Together, the NLB and CLB clusters embody the Application Center Web cluster, and the database cluster uses MSCS.
Suppose you have an e-commerce site and are planning a big product rollout, during which many customers will want to buy the product. This situation will significantly increase Web site traffic, but you're not sure by how much. You've always just added servers as you needed them, but setting them up is a pain. For the product rollout, you'd like to be able to scale the performance of the Web site by adding servers to your Web farm as easily as you can plug in disk drives to a RAID set. The notion of RACs applies in precisely this type of scenario.
Application Center provides wizards for creating a cluster, adding new servers to a cluster, and deploying new content and configurations to cluster members. When you create a new cluster, you define a cluster controller that not only participates in the cluster but also owns all the configuration information. Then, you can specify additional members for the cluster. When you do so, Application Center deploys the COM+ settings, CryptoAPI settings, Registry keys, Windows Management Instrumentation (WMI) settings, file-system information, IIS meta-base settings, and Web server content to each new cluster member. You end up with a cluster of clones, and you can use the Application Center Administrator to easily add to and remove from their numbers. Plus, Application Center transparently handles the usually tedious NLB configuration and deployment.
Application Center will support third-party IP load balancers in addition to NLB. As of this writing, Microsoft is working on support for Cisco Systems' LocalDirector, F5 Networks' BIG-IP, and Alteon WebSystems' ACEdirector. However, Application Center doesn't integrate these load balancers' administrative operations the way it does NLB's administration, so maintaining third-party load balancers will require some extra work.
You can install the full version of Application Center, which includes all the necessary components to create an Application Center cluster, on the following Win2K versions:
- Win2K Server Service Pack 1 (SP1)
- Win2K AS SP1
- Datacenter SP1
You must have Win2K SP1 to support Application Center. If you don't, you can install only Application Center Administrator, which lets you remotely administer Application Center and IIS. The following Win2K and NT versions support Application Center Administrator:
- Win2K Professional
- Win2K Server
- Win2K AS
- NT Workstation 4.0 SP6 or later (x86 only)
- NT Server 4.0 SP6 or later (x86 only)
You will be able to purchase Application Center, which should be shipping by the end of this year, separately as one of Microsoft's new .NET Enterprise Server products.
|Related Microsoft Web Sites|
"Exploring Windows Clustering Technologies"|
"Network Load Balancing Technical Overview"
"Windows 2000 and Component Load Balancing"
"Microsoft Application Center 2000"
"Introducing Windows 2000 Advanced Server"
"Microsoft Windows 2000 Datacenter Server"
"Windows DNA Application Services"
"Microsoft Windows DNA"
"The .NET Platform and the Evolution of
"Microsoft PDC 2000"
You might be wondering how Microsoft's clustering solutions fit into the company's next-generation Windows services, dubbed Microsoft .NET. Figure 4 compares Win2K and Windows DNA to Windows.NET and the .NET platform. Microsoft's cluster solutions fit into two areas of Microsoft .NET: Windows.NET (MSCS) and .NET Enterprise Servers (NLB, CLB, Application Center). Windows.NET represents the evolution of the Win2K OS, code-named Whistler and Blackcomb for its next two revisions. The .NET Enterprise Servers are the following products (with integrated XML capabilities):
- Exchange 2000
- SQL Server 2000
- BizTalk Server 2000
- Application Center 2000
- Host Integration Server 2000
- Commerce Server 2000
- Internet Security and Acceleration (ISA) Server 2000
- Mobile Information 2001 Server
Windows.NET and .NET Enterprise Servers are two foundational components that Microsoft will enhance as it moves forward with the .NET platform. Microsoft is serious about Windows' scalability, availability, and reliability. Delivering on these characteristics becomes even more important as Microsoft advances .NET further into the Internet world as the platform for business.
Clearly, the presence of robust clustering technologies such as MSCS, NLB, CLB, and Application Center are vital to Microsoft's success in supporting business-critical applications. I expect to see enhancements and extensions to these clustering technologies, especially in Application Center and in new .NET technologies such as Active Server Pages+ (ASP+) load balancing, which Microsoft introduced at the Professional Developers Conference earlier this year. Take the time now to understand how these powerful clustering solutions work and how they might address your business problems.