As an ADO.NET and Visual Basic (VB) instructor and consultant, I often get questions from clients, students, the newsgroups, and list servers about using ADO.NET's connection pools. People ask questions such as:
- How can I enable and disable the connection pool?
- How many connections are already in the pool?
- ADO.NET and ADO seem to lock up after about 100 connections; why can't they open more connections?
- How can I identify the user executing the code in the connection string without quickly running out of connections?
- How can I make sure only the right people have access to the database and still leverage the connection pool?
- How can I share a common connection between different parts of my application?
After reading this article, you'll know the answers to these and many other connection-pool questions. (No, I won't be discussing pH levels or how to super-chlorinate your connection pool when it gets dirty.) I discuss how to properly connect applications to and, more importantly, disconnect them from the server when the connection pool manages your connections. In an upcoming article, I'll follow up with how to monitor the activity of the connection-pooling mechanism (aka the pooler) and how to make sure your application is using the pooler correctly—preferably before it overflows and shorts out your system.
A Little History
More than 5 years ago, Microsoft introduced pooled connections for all ODBC drivers to address several problems that developers were solving on their own. Developers needed a way to reduce the cost of establishing or reestablishing a connection—it took too long, limited scalability, and consumed too many server-side connections. The basic idea behind pooling is that when an application closes a connection, the connection handle returns to a driver- or provider-managed pool, where it remains for a given length of time so that the application can reuse it. Although this approach isn't as important in client/server applications that use only one connection, it's crucial when you create high-performance middle-tier COM+ or Active Server Pages (ASP) applications that run multiple instances of components that could safely reuse the same connection handle.
Since the first version of connection pooling, every version of ODBC has supported connection pooling by default. SQL Server 2000 resolved many early problems with connection pooling, and Microsoft included the feature in OLE DB and the .NET Framework.
How Does the Connection Pool Work?
When connection pooling is enabled (which it is by default), the data provider draws each connection from a pool of pre-opened, idle connections or makes a new connection to the database and creates a new pool. This means that when you close your connection in code, you're really just releasing the connection back to a pool of used connections; the database connection to the data source remains open and continues to hold resources on the server until the pooler times out the connection and closes it. The timeout takes between 4 and 8 minutes. However, if the same process needs to reconnect, the connection pool simply plugs it back into an existing "hot" connection—assuming the connection string and other factors haven't changed. You might not get the same connection again, but in theory, the connection you get should be functionally equivalent to the original. This means that your application can save the time that it would otherwise use to establish a brand-new connection to the server.
As with any specialized functionality, connection pooling has rules for its use. Here are some common questions you might have about the rules that govern connection pools.
When does the .NET Data Provider create a new pool? Your application might not always be able to use an existing connection. If any of the following items change, the .NET Data Provider creates a new pool:
- The ConnectionString property—This property must be the same each time ADO.NET uses the Open method to open the Connection object. Even if the only change is that the arguments are in a different order, the string is different. If you add, change, or remove arguments—even if they simply reflect the default behavior—the property is different, and you get a new pool.
- The User ID (UID) or password—If you use a specific UID for each connection, each UID gets its own pool.
- The process identifier (PID)—Each process, program, component, Web service, or Web application gets its own pool. I discuss this aspect further in a moment.
- The transaction scope or enlistment—If you're using transactions, each transaction scope gets its own pool.
Where are connection pools created? The provider always creates connection pools on the system where the code creating the connection runs: on the client. In a client/server situation (e.g., a Windows Forms application opens a connection to a remote server), the provider establishes the connection pool on the client (local system). In Web service, ASP, ASP.NET, or COM+ architectures, the code that opens the connection runs on a remote server that's hosting Microsoft IIS or Microsoft Transaction Server (MTS). In these cases, the "client" is the code running on the remote code server. Connection pools aren't created on the database server unless it happens to be hosting IIS or MTS.
How does the security setting affect pooling? As I mentioned, if you use a specific UID for each connection, each UID gets its own pool. But if you specify Integrated Security=SSPI in the ConnectionString, you get one pool for your process's User Account. This means that you can't specify a specific role or UID and share the pool that the .NET Data Provider created for another role or user.
Why does the PID affect pooling? Like every database interface since the dawn of time, ADO.NET can't share connections across processes, so each process must have its own connection pool. A client/server application gets its own PID each time you run it. If you have five instances of an application, each gets its own PID, regardless of whether you run the application on one system or five systems. If you run a middle-tier component—such as a Web service, a COM+ component, or a Web application—that component also has its own PID; but in this case, multiple instances are running the same process. This means that a Web service can serve a few clients or a few thousand clients—all of which share the same connection pool. When you're developing your Web service, each time you compile and create a new assembly, the new executable gets a new PID—and a new connection pool. So you could potentially have two nearly identical copies of an executable on your system, but each gets its own connection pool.
What happens when the pool is full? If the pool fills up with connections (the default limit is 100 connections), the .NET Data Provider waits for another instance of the application to close an existing connection. If the .NET Data Provider can't get a connection in time (i.e., within the limit that the Connection Timeout setting specifies), it throws a System.InvalidOperationException with the message Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. You don't want to show this message to your users because it's not their fault and, generally, users can't debug your code (which caused the problem in the first place). When I get this message, I send a message to the Web master offering to fix the problem. Note that ADO.NET uses System.InvalidOperation- Exception for a variety of situations, so you need to parse the Message string to find out what's really wrong.
When are pools destroyed? Connection pools are destroyed when the process that creates them ends. Because of this, an ADO.NET Data Provider running under IIS or MTS creates connection pools and destroys them when the process or Application Domain (AppDomain) ends. Note that a bug exists in the performance counters that the SQL Server .NET Data Provider (System.Data.SqlClient) exposes. The bug falsely reports the existence of active pools. To make sure that what you're seeing isn't a "ghost" pool, stop the Performance Monitor application, take a drink of coffee, and check again. This bug isn't fixed in .NET Framework 1.1. When you're working with Windows Forms applications, pools drop off when the application or process ends. This behavior makes sense because the process creating the remote connection pool wouldn't necessarily end if IIS is acting as an agent for the Web application or Web service.
If you open another connection in code, the pool returns to life when you open its member connections. Don't confuse connection pools that the .NET Data Provider manages with the database connections that the pooler opens and closes. When your code closes a connection, the connection returns to the pool. After a period of time that the .NET Data Provider determines, the connection to the database closes if it's not reused.
Be aware that if you shut down the SQL Server system (or your database management system—DBMS) or it shuts down on its own, any existing SqlClient connection pools remain unless IIS or MTS shuts down, too. These remaining pools are called zombied pools. If you subsequently restart SQL Server, the SqlClient .NET Data Provider tries to use the zombied pools to reconnect to the server. The result is the System.Data.SqlClient.SqlException General network error. Check your network documentation. Fortunately, this exception destroys the zombied pools, and when the component restarts, the SqlClient .NET Data Provider creates a new pool. I heartily suggest you add an exception handler to deal with this contingency.
Effectively Using Connection Pools
So, what's the best way to use connection pools? Given the constraints that the .NET Data Provider implements, you might think that it's easy to overflow the pool or server with connections if you aren't careful what valves and switches you open and close. Remember that the limiting resources are the number of connections that a pool can handle and the number of connections available on the server. As I noted earlier, the default number of connections a pool can handle is 100; you can set a higher number, but when the number of connections within the pool reaches this value, the .NET Data Provider won't create more pooled connections. Ideally, your application should use a connection for a moment and release it back to the pool for other instances to share, so 100 connections per pool should be more than enough.
You can let the .NET Data Provider create any number of pools, again limited by the number of connections SQL Server allows according to your license or administrator settings. If some of your components are likely to compete for connections in a heavily used pool, it's a good idea to separate those components by creating two or more pools—one for each functional operation. One way to establish this separation is to set the application name in the ConnectionString to a unique value for each pool you want the .NET Data Provider to create. Watch out, though, because when a pool overflows (i.e., you exceed the set number of connections), your component performance grinds to a halt and your customers take their business to other sites.
Client/Server Pool Rules
Before you wade too deeply into connection pooling, you should reflect on what kind of application will benefit from connection management. Not all applications do. Client/server applications use pooled connections, but the specific instance of the client application is the only beneficiary. That's because each application runs in a unique process, so each client application gets its own pool. Multiple copies or instances of the same Windows Forms application don't share the same pool; however, when the application makes multiple connections (as applications often do), the .NET Data Provider manages all the connections in a common pool.
When you're working with client/server applications that need to manage the server-side DBMS more closely, the pool can be more of a hindrance than a help. In such cases, you might want the functionality of a like-new connection each time you use the Open method and an assurance that the connection actually closes when you use the Close method. Conversely, if the application repeatedly makes and breaks connections, the connection string remains unchanged, and you don't use transactions, the .NET Data Provider will create a pool for your connection. This means that the application will reuse your pooled connection whenever possible, and your application will run faster because the connection isn't being reopened and closed repeatedly.
Wading into the Web
Currently, the overwhelming focus at Microsoft is Web- and middle-tier architectures, in which a code snippet's lifetime is shorter than a rabbit's heartbeat and replicated just as quickly. These routines have to open a database connection, execute a query, and return a response very quickly—before the customer loses interest. Between one and n instances of the component simultaneously execute a process—dozens to thousands of times per second—with all the code snippets doing pretty much the same thing and using the same connection string.
If the component (such as a Web service) takes too long to finish using the connection, the .NET Data Provider might add more connections to the component's private pool. Also remember that each .NET assembly gets a unique PID. This means that each new assembly you run gets its own pool—regardless of the similarity of function or the fact that their ConnectionString values are identical.
Managing Lots of Swimmers
In any architecture, you need to carefully monitor two factors: the number of pools and the number of connections each pool manages. Fortunately, you can control both with code.
When your Web site is under heavy use, a good strategy for making sure you have enough connections is to open each connection just before you need it and close it as soon as you can—the just-in-time (JIT) strategy. ADO.NET can help in this because it opens connections automatically when you use the DataSet Fill or Update method. If you're using a DataReader, you have to open and close the connection yourself. Even if you use the CommandBehavior.CloseConnection option, you still have to close the DataReader to get the associated connection to close. Unlike VB 6.0, none of the .NET languages can guarantee your connection will close when a Connection object (such as SqlConnection) falls out of scope. If the connection is still open, you can't reuse it. If the Connection object is still open when it falls out of scope, it's probably lost forever. For example, the code segment that Listing 1 shows will "leak" a connection each time the procedure executes because the Connection isn't closed before the function that creates the Connection object is completed. This leak happens because the code can no longer reference the Connection object (which owns the pooled connection) after it falls from scope, and the .NET garbage collector won't necessarily clean up these orphaned objects. Although the code in Listing 1 works in VB 6.0 and COM-based ADO, the code doesn't work in ADO.NET.
How your application behaves when all available connections are in use is crucial to the effective performance of your Web site. A helpful tip is to remember that you can increase the value of the Connection Timeout connection-string argument or the Connection object's ConnectionTimeout property. You use one or both of these values to set the length of time ADO.NET waits for a connection to be released from the pool and made available to your code. If you set the value too high, the browser (or client) might time out and throw the System.InvalidOperationException before you get connected. If you set the value too low, your error handler will have to know how to retry the connection—possibly after asking the customer to be patient while your application deals with other requests.
Activating, Tuning, and Disabling the Connection Pool
When you're using the SqlClient .NET Data Provider to access SQL Server 7.0 and later, you need to create a new SqlConnection object and set its ConnectionString property. In the ConnectionString property, you choose the security setting you'll be using, point to the server, and set the communications protocol. Using this property is the only way to change the SqlClient .NET Data Provider's behavior and control how it interacts with the connection pool. By using the appropriate ConnectionString keywords, you can turn off pooling, change the size of the pool, and tune the pool's operations.
Understanding connection-pool options. Table 1 defines the SqlClient.SqlConnection.ConnectionString keywords that determine how the .NET Data Provider manages the connection pool for the specific connection you're opening. You can set these keywords on a connection-by-connection basis, which means that some Connection objects will be pooled and others won't be.
Turning off connection pooling. When you want more hands-on control of the server-side connection state or you want to debug a connection-pooling problem, using an unpooled connection makes sense, so it's nice to know that you can easily turn off connection pooling. If you're sure you want to turn off pooling, simply set the Pooling argument in the ConnectionString to False, and the .NET Data Provider won't pool the Connection object that you opened by using this option. The Open method will establish a new connection to the server (if one is available), and when you use the Close method, the .NET Data Provider will close the server connection immediately. The server can deny a connection-open request if you've exhausted the number of connections the sa allows or you exceed the number of connection licenses registered for the server. If you're turned away from the server for some reason, your code needs to display a message to tell your user to come back later or simply wait and retry.
Setting the maximum pool size. You use the Max Pool Size keyword to determine how many connections the pool will hold. By default, the .NET Data Provider permits 100 connections to accumulate in each pool. This doesn't mean that you start with 100 SqlConnection objects or 100 connections, only that you can't create more than 100 connections per pool. If you try to exceed the maximum limit you've set, your application will wait until you pass the ConnectionTimeout limit or until a connection becomes available in the pool, then raise an exception. Max Pool Size doesn't limit the number of pools you can create—that limit is determined by the number of SQL Server connections that your sa setting or license makes available (about 32,768 per cluster).
In some cases, you might want to leave one or more connected SqlConnection objects in a pool for an indefinite length of time. If so, set the Min Pool Size keyword to an appropriate value. Setting this value can help connection performance for an application that doesn't run often but can't wait for the .NET Data Provider to construct a new SqlConnection object and establish a connection. You can also use this technique as a way to reserve connections. For example, you might set a Min Pool Size value if you want your Internet application to be able to deal with a flood of queries that grab all the connections from the pool but still ensure that you have a private connection. Or suppose you're hosting several applications on the same server and one of those applications is getting hammered with queries; by using this technique, you can reserve some connections for each application.
After you close a connection in code, ADO.NET notifies the connection pooler to release the connection back to the pool, and after a time (between 4 and 8 minutes), a connection-pool routine closes the database connection. The Connection Lifetime ConnectionString keyword isn't what it appears to be. Microsoft included this option to handle a very special circumstance—when you're using clustered servers. Connection Lifetime helps release connections more quickly when several SQL Servers are working in a cluster. In nonclustered systems, the argument has no bearing on how long the connection remains in the pool.
As I mentioned, generally the connection-pool mechanism uses a random connection lifetime of between 4 and 8 minutes. After you close the ADO.NET Connection object, connections remain open for the randomly selected period of time after your application releases them. The .NET Data Provider calculates the pool lifetime starting from the time the pooled connection was created.
Jump In with Both Feet
The way that ADO.NET interfaces with the connection-pool mechanism is new—but connection pools have been around for several years now, so you should be fairly comfortable with how they work. However, if you're not using ADO.NET and the SqlClient .NET Data Provider, it's tough to manage connection pools. If you don't understand the pooling mechanism, there's a good chance you'll create applications that work for a while but fail under stress. Or you might create applications that exhibit substandard performance, but you can't determine why. With the information in this article, you can explore how best to manage your connections by using the built-in pooling mechanism.
What's missing from this article is information about how to monitor the pool. You need to know how many pools are being managed, how many connections are being held in each pool, and which applications or processes create these pools. I'll cover all this information and more in an upcoming article that discusses how to monitor the state of the pool by using existing utilities and your own code.