Concurrency and Throttling Configurations for WCF Services

Control the Number of Concurrent Requests to Each Service

RELATED: " Load Balancing and Scaling Your WCF Services" and " Proxies and Exception Handling"

In my last column I explained instancing modes for WCF services in "WCF Service Instancing," which control the lifetime of each individual service instance allocated to a request thread. Although PerSession instancing is the default, PerCall instancing is the preferred setting for server deployment that must support a large number of client requests. This month I ll discuss other settings that also influence overall throughput to your WCF Web services.


Scalability and Throughput Features

Scalability and throughput requirements of services hosted on a client machine, versus those deployed to server environments, are not equal. Services hosted in-process are initialized and invoked on demand; those hosted on client machines at best may be consumed by multiple client threads. Services deployed to server machines either Web servers exposed to the Internet or servers behind the firewall that satisfy intranet clients can expect to serve a significantly higher number of concurrent requests. The number of requests may be predictable if the number of clients is controlled, or may increase in exponential proportions due to a much wider client-base with potential for continued growth.


Ideally, your services will always be ready to process incoming requests and juggle the expected load, while not maxing out host machine resources and crippling the system. WCF features that support this need include instancing mode, concurrency mode, and throttling behaviors. As I discussed in WCF Service Instancing , instancing mode controls the lifetime of each service instance, letting you allocate an instance per call, per session, or a single instance for all clients. Concurrency mode controls how and if each individual service instance allows concurrent calls, which can affect throughput. Throttling behaviors allow you to control the request load to each service, restricting the number of concurrent calls, the number of sessions allocated, and the number of service instances.


Concurrency Mode

Concurrency issues arise when multiple threads attempt to access the same resources at run time. When requests arrive to a service, the service model dispatches the message on a thread from the thread pool. Certainly, if multiple clients call the same service, multiple concurrent request threads can arrive for a service. The particular service object handling each request is based on the instancing mode for the service. For PerCall services, a new service object is granted for each request. For PerSession services, the same service object receives requests from the same client (or, proxy). For Single instancing mode, all client requests are sent to the same singleton service object. Based on this alone, PerSession services are at risk of concurrent access when the client is multithreaded, and Single services are perpetually at risk.


The concurrency setting for a service is controlled by the ConcurrencyMode property of the ServiceBehaviorAttribute. By default, only one request thread is granted access to any service object, regardless of the instancing mode; this is because the default setting for ConcurrencyMode is Single, as shown here:



public class MessagingService : IMessagingService


This property can be set to any of the following ConcurrencyMode enumeration values:

  • Single. A single request thread has access to the service object at a given time.
  • Reentrant. A single request thread has access to the service object, but the thread can exit the service to call another service (or client callback) and reenter without deadlock.
  • Multiple. Multiple request threads have access to the service object and shared resources must be manually protected from concurrent access.


The following sections briefly describe each mode, and discuss their relevance to Web service deployments.


Single Concurrency Mode

By default, services are configured for Single concurrency mode. This means that a lock is acquired for the service object while a request is being processed by that object. Other calls to the same object are queued in order of receipt at the service subject to the client s send timeout or the service s session timeout, if applicable. When the request that owns the lock has completed, and thus released the lock, the next request in the queue can acquire the lock and begin processing. This configuration reduces the potential throughput at the service, when sessions or singletons are involved, but it also yields the least risk for concurrency issues.


Configuring services for Single access doesn t impact PerCall services because a new service instance is allocated for each request, as shown in Figure 1.


Figure 1: PerCall instancing mode with Single concurrency.


For PerSession services, Single concurrency disallows multiple concurrent calls from the same (multithreaded) client, while not impacting throughput of multiple clients (see Figure 2); for Single instancing mode, only one request can be processed across all clients (see Figure 3).

Figure 2: PerSession instancing mode with Single concurrency.


Figure 3: Single instancing mode with Single concurrency.


As I ve said, when you expose WCF services over HTTP as Web services, chances are you ll be using PerCall configuration. Sessions for WCF Web services are usually better facilitated by persisting data between calls to a database, rather than using an application session (which is not durable). That means the default concurrency mode setting of Single will not reduce the potential throughput of requests to your application.


Reentrant Concurrency Mode

Reentrant mode is necessary when a service issues callbacks to clients, unless the callback is a one-way operation. That s because the outgoing call from service to client would not be able to return to the service instance without causing a deadlock. This mode is also necessary when services call out to downstream services, which implies returning to the same service instance.


Services configured for Reentrant concurrency mode behave similarly to Single mode, in that concurrent calls are not supported from clients; however, if an outgoing call is made to a downstream service or to a client callback, the lock on the service instance is released so that another call is allowed to acquire it. When the outgoing call returns, it is queued to acquire the lock to complete its work. Figure 4 illustrates how PerCall services would behave with and without reentrancy for non-one-way callbacks. In this case, the only thread that might need to reenter the service is likely an outgoing callback. Likewise, if the service were to call services downstream that later attempted to call back into the top-level service, reentrancy would allow it (however, it is poor design to have circular service references).


Figure 4: Comparing PerCall instancing mode with Single or Reentrant concurrency on non-one-way calls.


Because each request thread gets its own service instance, callbacks are the primary scenario that applies to your PerCall Web services. Thus, if you are using WSDualHttpBinding and your callbacks aren t one-way, you ll set the concurrency mode to Reentrant. You should also pay close attention to calls to downstream services that may need to call back to upstream services.


Multiple Concurrency Mode

Services configured for Multiple concurrency mode allow multiple threads to access the same service instance. In this case, no locks are acquired on the service instance and all shared state and resources must be protected with manual synchronization techniques. This setting is useful for increasing throughput to services configured for PerSession and Single concurrency mode.


Instance Throttling

To increase throughput at the service, multiple concurrent calls must be allowed to process. PerCall services can support multiple concurrent calls by default because each call is allocated its own service instance. PerSession and Single mode services can allow multiple concurrent requests when configured for Multiple concurrency mode. However, regardless of the concurrency mode, server resources are not generally capable of servicing an unlimited number of concurrent requests. Each request may require a certain amount of processing, memory allocation, hard disk access, network access, and other overhead.


WCF provides a throttling behavior to manage server load and resource consumption (with the following properties):

  • MaxConcurrentCalls. Limits the number of concurrent requests that can be processed by all service instances. The default value is 16.
  • MaxConcurrentInstances. Limits the number of service instances that can be allocated at a given time. For PerCall services, this setting matches the number of concurrent calls. For PerSession services, this setting matches the number of active session instances. This setting doesn t matter for Single instancing mode, because only one instance is ever created. The default value for this setting is 2,147,483,647.
  • MaxConcurrentSessions. Limits the number of active sessions allowed for the service. This includes application sessions, transport sessions (for TCP and named pipes, for example), reliable sessions, and secure sessions. The default value is 10.


Each of these settings is applied to a particular service configured through its ServiceHost instance (associated to the .svc file when hosting with IIS or WAS). To set these values declaratively you associate a service behavior and add the section. Figure 5 shows a service behavior with the default throttling values.






contract="Counters.ICounterService" />







maxConcurrentInstances="2147483647" maxConcurrentSessions="10" />




Figure 5: Default service throttling values.


The appropriate settings for throttling behavior depend on a number of factors, including the instancing mode for the service, the number of services exposed by the application, and the desired outcome of throttling. In the next sections I ll discuss throttling in the context of these different factors.



The throttle for MaxConcurrentCalls affects the number of concurrent request threads the service can process to any of its exposed endpoints. Regardless if the instancing mode is PerCall, PerSession, or Single, this setting should be approached with the idea of limiting the number of active threads to a particular service, which allows you to do the math and estimate the number of requests that can be processed per second. For example, if a PerCall service with one or more endpoints allows 30 concurrent requests, and each request averages .2 seconds, roughly 150 requests per second can be processed by a particular worker process (assuming IIS hosting over HTTP). Multiply the number of worker processes and that number increases for a single machine in your Web server tier.


If you host two services in the same application, each allowing 30 concurrent requests, at full capacity 60 concurrent requests can execute. As you increase the number of services, this can eventually have a negative effect on throughput, as an increasing number of threads increase the context switching required to execute them concurrently. For this reason you ll want to consider the potential use of each service alongside the total number of concurrent threads that are optimal. By the same token, you don t want to limit the number of concurrent requests to a particular service, such that queued requests begin to time out.


Now, what I just said about the increased number of concurrent requests as you add services to the application applies only to WCF services that are NOT hosted by IIS or WAS over HTTP. With IIS and WAS hosting, ASP.NET is engaged in the processing of requests, at least to forward the request to the WCF thread from the ASP.NET request thread. If the call is one-way, the ASP.NET thread is released and the WCF threads will be allocated according to the throttle setting. If the call is request-reply, WCF blocks the ASP.NET thread while processing the thread on the WCF thread. That means that the ASP.NET processing model is responsible for request throttling for non-one-way calls.


Ideally, you want to reach somewhere between 350 to 500 requests per second on a single CPU. You should be able to achieve this by allocating 30 request threads across all services, but this is not a guarantee, as many factors can influence this outcome, including request-processing overhead and server-machine horsepower.



Some creativity may be involved in setting the correct throttle value for MaxConcurrentSessions. That s because sessions live longer than requests, yet they consume more resources so they have conflicting requirements. On the one hand, a session lives longer than a request; thus, you don t want to prevent users from connecting to the system if you can afford to accommodate them. On the other hand, if the nature of the session is allocating a large amount of memory (or other resources), the server may only be able to accommodate so many. The number of active application sessions is traditionally low compared to the number of users in the system but if you have one million users, at 5 percent online, that still means 50,000 sessions might be requested at a given time.


For BasicHttpBinding and WSHttpBinding without reliable sessions or secure sessions, this is a non-issue because sessions are not supported for these configurations. Thus, the setting for concurrent sessions has no impact. In the case of outward-facing PerCall services that also support reliable sessions or secure sessions (via WSHttpBinding), the overhead of the session is minimal compared to application sessions that could maintain significant state. These sessions default to a 10-minute expiry, and if your service receives close to 300 requests per second, that could mean up to 180,000 requests in 10 minutes (some percentage of which are in the same session). Even at 5 percent, that s 9,000 concurrent sessions that might need to be supported to allow unique clients to get in the door. The bottom line is that you must be well aware of the usage patterns of your clients, and make sure you have the right balance to prevent request timeouts (waiting for a new session), while also preventing excessive use of server resources.


For application sessions or transport sessions used in a traditional client-server scenario, the number of active sessions allowed should be weighed against the amount of resources consumed by each session. Ultimately, the purpose of the throttle in this case is to prevent the server from maxing out its memory usage, or that of other limited resources consumed by each session. Similarly, downstream services exposed over NetNamedPipeBinding or NetTcpBinding require a transport session that is another resource that has configurable limits on Windows systems.



The appropriate setting for MaxConcurrentInstances varies based on the instancing mode for the service. For PerCall services it should be equal to or greater than MaxConcurrentCalls. For PerSession services, MaxConcurrentInstances should meet or exceed MaxConcurrentSessions where application sessions are involved. That s because the value actually limits the number of concurrent service instances that can be kept active to support application sessions, which is much different than the number of concurrent, short-lived requests. For singleton services, MaxConcurrentInstances is irrelevant, because only one instance of the singleton is ever created.



Because your Web services are typically configured as PerCall services over HTTP bindings, you should take from this discussion that the default concurrency mode (Single) is acceptable unless callbacks are involved. You should also have some idea how to assess the appropriate throttling behaviors for your Web services exposed over HTTP: for concurrent requests, by assessing expected load across all services; for concurrent sessions, based on use of reliable or secure sessions; and for concurrent instances, based on the setting for concurrent requests. In the rare case you employ application sessions for services, you must also consider resource allocation for those resources. In addition, you should be mindful of appropriate configurations for downstream services invoked by your Web services.


NOTE: For examples of concurrency mode and throttling configurations discussed in this article, see sample code for Chapter 5 of my book, Learning WCF (available at


Michele Leroux Bustamante is Chief Architect of IDesign Inc., Microsoft Regional Director for San Diego, Microsoft MVP for Connected Systems, and a BEA Technical Director. At IDesign Michele provides training, mentoring, and high-end architecture consulting services focusing on Web services, scalable and secure architecture design for .NET, federated security scenarios, Web services, interoperability, and globalization architecture. She is a member of the International .NET Speakers Association (INETA), a frequent conference presenter, conference chair for SD West, and is frequently published in several major technology journals. Michele is also on the board of directors for IASA (International Association of Software Architects), and a Program Advisor to UCSD Extension. Her latest book is Learning WCF (O Reilly, 2007); see her book blog at Reach her at mailto:[email protected] or visit and her main blog at




Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.