Load Balancing and Scaling Your WCF Services

WCF Design and Configuration Recommendations for Distributed Environments

In previous installments of this column for asp.netPRO I've discussed topics related to this discussion, such as "Concurrency and Throttling Configurations for WCF Services," "Proxies and Exception Handling," and "WCF Proxies: To Cache or Not to Cache?" This month's column explores the impact of these WCF features on load balancing and scalability scenarios.

Deploying WCF services in a distributed environment requires developers to be aware of certain design issues and configuration settings that can impact performance and scalability. IT staff is not likely to be educated in all things WCF, thus it is up to the developer to bridge the gap between IT and development probing into the logistics of production deployment. In this article I ll review WCF considerations for distributed scenarios, specifically related to load balancing and scalability. I ll start by providing some guidance on the service-level goals you should be looking for, then dive in to WCF specifics, including the affect of WCF sessions, binding considerations, channel creation overhead, how approaches differ for client-server compared to server-server scenarios, counting service hops, and throttling.

Performance, Throughput, and Scalability

Before diving in to the WCF specifics, it helps to understand the difference between performance, throughput, and scalability three important features of a distributed environment.

Performance refers to the time it takes to complete a request; it often is measured in two ways:

  • Execution time refers to the time it takes to complete a request from the time it reaches the server machine to the time the last byte is returned to the client. This measurement does not take into account latency between the end-user machine and the system.
  • Response time refers to the time it takes to complete a request from the user s viewpoint or, perceived performance. This measures from the time the request is issued to the time the first byte is received.

Clearly you have more control over execution time, which is influenced by available server resources (CPU, memory, disk IO) and communication overhead between moving parts of the application (for example, crossing process or machine boundaries).

Throughput refers to the number of requests per unit of time, usually measured in requests per second. Once again, server resources and communication overhead can influence this number, in addition to any throttling configurations that limit concurrent calls. Throttling is important because it prevents server machines from being maxed out, which can cause catastrophic crashes that prevent new requests from being processed altogether.

Scalability refers to the system s ability to provision new resources without impact to the application. This can mean adding new server machines that the application seamlessly uses (horizontal scaling), or adding resources to a particular machine to handle more load (vertical scaling). Typically, vertical scaling has no impact on application design and configuration, but horizontal scaling can be a problem if applications have server affinity as in sessions depending on how the system is configured to manage that affinity.

The table in Figure 1 summarizes some common Service Level Agreement (SLA) goals for performance, throughput, and scalability. Of course, real-time applications may have stricter requirements for performance specifically, but these benchmarks are numbers that are usually satisfactory for a service-oriented application.



Target SLA Goals


< 2 seconds average request execution time


150-500 requests/second on a single server with 4 CPU depending on the overhead of each request


Ability to add new servers with minimal configuration effort

Figure 1: SLA goals for performance, throughput, and scalability.

Now I ll take a look at the WCF features that can influence these measurements.

Load Balancing

Horizontal scaling implies distributing load across multiple server machines in a load-balanced environment. When multiple concurrent requests are received, they are typically distributed among the available machines, usually with a round-robin approach, or (better) by an algorithm that determines which machine has the least active requests being processed. Software load balancers (such as Network Load Balancer, or NLB) or hardware load balancers (appliances such as a Cisco router) are usually responsible for the algorithm used to distribute load.

In theory, the best situation is for requests to be freely distributed to the most available machine but sessions usually get in the way of this freedom. For WCF services, transport sessions such as TCP require load balancers be configured for sticky IP , while application sessions, reliable sessions, and secure sessions require sticky session configuration. Failover a situation where if one machine fails, another machine can pick up the session where it left off is not built-in for WCF services, although application sessions can fail over if the service is a durable service.

Load Balancing and Bindings

There are several binding features that influence the ability to load-balance services. Here is a short list of standard bindings and the features that require consideration in a load-balanced scenario:

  • NetTcpBinding. This binding requires sticky IP behavior so that clients are returned to the same machine where the socket is. Aside from the socket, WCF also depends on clients returning to the same communication channel.
  • BasicHttpBinding, WebHttpBinding. These bindings result in stateless communication channels by default; however, the default behavior is to enable HTTP Keep-Alive, which can result in server affinity. To disable Keep-Alive, a custom binding must be created and the KeepAliveEnabled property set to false (see the downloadable code sample for an example of this).
  • WS[2007]HttpBinding, WS[2007]FederationHttpBinding, WSDualHttpBinding. These bindings all have secure sessions enabled by default. They also all support reliable sessions with the latter binding, requiring reliable sessions to be enabled. Both of these features require sticky sessions to ensure that requests get back to the same machine, with the same server channel. These bindings also enable HTTP Keep-Alive by default, but there is no point in disabling Keep-Alive unless sessions are disabled for the binding.
  • NetTcpContextBinding, BasicHttpContextBinding, WSHttpContextBinding. These bindings are context-aware equivalents to a few of the bindings already discussed. That means, in addition to features already mentioned, they are also context-aware. Context-aware bindings support durable services and workflow services both of which rely on a database to store state and rehydrate instances if a message is received to an alternate machine. In fact, these bindings require sticky sessions to maintain the same channel between client and service, but if a new client channel is created, it can pass an existing context and successfully construct the service in its current state on another machine.

In summary, HTTP bindings that do not have HTTP Keep-Alive, secure sessions, and reliable sessions enabled can be effectively load balanced within the context of the same client proxy. The remaining bindings have server affinity for the lifetime of the channel. Is this bad? Not necessarily. Although greater scalability can be achieved if new requests are always passed to a server with the most resources, load balancers also can look at the number of sessions living on a particular server and distribute new sessions to those servers with fewer sessions. For the benefits that sessions bring, this is usually an acceptable cost.

Load Balancing and Sessions

Allow me to further elaborate on the impact of various types of sessions on load-balancing scenarios. To begin, Figure 2 illustrates a scenario where an HTTP binding without session or Keep-Alive settings enabled is in use. Each call from the same proxy will be sent to any available machine according to the load balancer s algorithm. Each operation is designed to manage its own state, and the service is designed as a PerCall service (no state).


Figure 2: Load balancing without sessions.


When the service is a PerSession service (an application session is present), the in-memory state of the service relies on each call from the same proxy returning to the same machine (sticky session), as shown in Figure 3. If, on the other hand, the service is a durable service, service state is stored in a database between calls (see Figure 4). That means that subsequent calls can be received by a different machine and can properly initialize the service to its current state.


Note: I avoid using sessions in WCF services and prefer the model where each method independently manages its own state in a custom database for the application.


Figure 3: Load balancing and application sessions.


Figure 4: Load balancing and durable services.


WCF services that support transport sessions (TCP) or that simulate transport sessions with other protocols (reliable sessions or secure sessions) also require sticky IP or sticky session configuration (see Figure 5). Once again, for the lifetime of the client, channel requests must be directed to the same server channel (where the session lives). If the channel fails on either side, a new session must be created but unlike application sessions, a new transport session can be established without impact to the client application as no application state is lost. An exception to this might be if reliable sessions are used to send a large message in smaller parts in which case the entire message will likely need to be re-sent.


Figure 5: Load balancing and transport sessions.


Proxy Lifetime

I ve described proxy lifetime issues in a past column; however, it is important to revisit this topic in the context of this discussion. There are two key scenarios to consider: client-server and server-server.

In a client-server scenario, a Windows client application uses a proxy to call WCF services. The proxy usually has a lifetime for as long as the client application is running. Calls from the same proxy instance will have server affinity if a session is present. In terms of scalability, the system would still be able to distribute calls from different clients (proxies) among load-balanced servers.

In a client-server scenario, the presence of a session has two important considerations that can impact performance: channel creation overhead in the event the application is multithreaded, and exception management when something happens to tear down the underlying client or server channel.

Because channel creation is expensive, if the client application is multithreaded it can have serious impact on the perceived application performance if each thread creates its own proxy to call a service. Even though .NET 3.5 introduced automatic channel factory caching features to optimize channel creation, it is better to cache the actual channel (the proxy reference) in a client-server scenario, and share that among threads.

Note: I wrote about channel caching options and .NET 3.5 features in the July 2008 column.

If the channel has a transport session (not an application session), it is best if the service allows multiple concurrent requests to the same channel. For this the service must be configured to support multiple concurrent calls even if it is a PerCall service:

[ServiceBehavior(InstanceContextMode =

 InstanceContextMode.PerCall, ConcurrencyMode =


public class PerCallService:IPerCallService

For services with InstanceContextMode PerSession or Single it may be best to leave ConcurrencyMode as its default value, Single. This way, only a single thread can access the shared service instance. In the case of PerCall, each thread always gets its own service instance, which means you are really only allowing multiple threads access to the server channel, not to the same service instance.

As for exception management, in the presence of sessions one must remember that an uncaught exception or timeout can put the channel into a faulted state rendering the proxy useless. If the channel did not have an application session, it is likely that the user doesn t care about the exception and you should create a new channel in stealth mode. If an application session was in progress and the service is not durable, the user should probably be notified of the failure before constructing a new channel.

Note: I wrote about exception handling techniques for this scenario in the January 2008 column.

In a server-server scenario you may have an ASP.NET application or another WCF service living in the DMZ calling downstream services. In this scenario, proxy lifetime management should be handled differently. You should never cache the channel and share among threads as this would create potential server affinity when calling downstream services. Though initially this may give the illusion of throughput, as the number of users and concurrent threads increase you quickly see a cap on throughput. You may be able to cache the channel factory (something that .NET 3.5 can handle for you) if the same credentials are used for all callers; for example, if a certificate is used to authenticate to downstream services. This doesn t work for scenarios where you must attach a supporting token for each call such as one that represents the initial caller and their roles. In that case a new channel factory and channel (proxy) must be created for each call.

Because the channel will not be cached, this scenario does not have concern for session expiry and exception handling for faulted channels.

Limiting Service Hops

In a distributed environment it is likely there are at least two service hops in the context of a single request thread. Because the server-server hops are likely to include the overhead of constructing a new channel for each call, this can quickly add too much overhead to the call chain of the request thread. Generally speaking, it is a good idea to stick to two or three service hops for a single request thread. Anything beyond this should be carefully benchmarked to make sure it yields good enough performance to meet SLA requirements. Remember that your goal should be to achieve the necessary performance for the application while still benefiting from a service-oriented application design.

Instance Throttling

It is important to allow the right number of concurrent requests and sessions for your WCF services. Allowing too many requests and sessions on a single machine can cause it to fail, but allowing too few limits the server from realizing its potential throughput. WCF provides a default ServiceThrottlingBehavior for each service to control the number of concurrent requests, sessions, and service instances, as follows:





       maxConcurrentSessions="10" />



The setting for maxConcurrentCalls controls how many concurrent threads will be allowed for the service type. A good number to start with for this is 30 concurrent calls which is similar to the default number of thread pool threads allocated by the ASP.NET runtime.

The setting for maxConcurrentSessions controls how many concurrent transport sessions can be created for the service type. This number should definitely be increased so that greater than 10 clients can connect to the service within a particular host process. Limiting to 10 means that only 10 TCP sessions, reliable sessions, or secure sessions are allowed which effectively limits the number of clients that can initialize a proxy to communicate with the service. This number should be estimated based on the usage patterns of application users. For example, if all users will log in to the application every morning in a corporate environment, you can expect that number to be distributed across load-balanced machines. On the other hand, if only a percentage of users are usually online concurrently, the collective number across load-balanced machines can be significantly reduced below the number of application users.

The setting for maxConcurrentInstances will naturally be throttled by the other two settings, so under most circumstances you can leave this value alone.


Developers should consider the impact of load balancing and scalability in their service design by doing the following:

  • Disable HTTP Keep-Alive to remove server affinity for simple HTTP bindings.
  • Make services durable if application sessions are supported.
  • Cache proxies in multithreaded client-server scenarios and silently recreate proxies as needed when channels are faulted.
  • Cache the channel factory if possible for server-server scenarios.
  • Allow multiple threads access to PerCall services to support multithreaded clients.
  • Try to keep service hops to two or three per request thread and benchmark as hops are added to verify good enough performance can be achieved.

IT should consider the following:

  • Configure load balancers for sticky IP or sticky sessions as needed where sessions are supported.
  • Ensure throttling configuration is sufficient for application throughput.
  • Monitor performance counters to ensure that performance and throughput results are meeting SLA requirements, adjusting configurations as necessary.
Another feature of load balancing worth discussing is how to configure WCF to work effectively with Big IP/F5 servers that process SSL and forward unencrypted messages to the service. This is an advanced subject that deserves an article of its own, so I will address this in next month s column.

Download the samples for this article at http://www.dasblonde.net/downloads/aspprodec08.zip.


Michele Leroux Bustamante is Chief Architect of IDesign Inc., Microsoft Regional Director for San Diego, and Microsoft MVP for Connected Systems. At IDesign Michele provides training, mentoring, and high-end architecture consulting services focusing on Web services, scalable and secure architecture design for .NET, federated security scenarios, Web services, interoperability, and globalization architecture. She is a member of the International .NET Speakers Association (INETA), a frequent conference presenter, conference chair for SD West, and is frequently published in several major technology journals. Michele also is on the board of directors for IASA (International Association of Software Architects), and a Program Advisor to UCSD Extension. Her latest book is Learning WCF (O Reilly, 2007); visit her book blog at http://www.thatindigogirl.com. Reach her at mailto:[email protected], or visit http://www.idesign.net and her main blog at http://www.dasblonde.net.

Additional Resources

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.