The word "performance" has several diverse and subtle implications. When people talk about how well something performs, one thing they might mean is how much work gets finished during a period of time. For example, a well-performing engine runs quietly and produces lots of torque. You can apply this same yardstick to your development team: A well-performing development team also works quietly and produces lots of good code. To me, performance means at least these two things—how well my code does its job and how well my team and I do our jobs. The tips in this article can help you do both—write code faster and write faster code—and do so quietly, with fewer noisy errors. Primarily, the tips pertain to ADO, especially how you use it to access SQL Server. But along the way, I touch on a few more general COM tips that apply to any Visual Basic (VB) code you write.
I've spent quite a bit of time working out which SQL Server data-access coding techniques, which architectures, and which development practices make for the best performance. In some cases, a technique has little bearing on the overall performance of the application or component unless you multiply its savings with repetition. For example, in a client/server application, you might save a second or two by not specifying an ODBC Data Source Name (DSN) for your connection. This savings might have little overall effect on the application's suitability or performance, but if you apply that same technique to a middle-tier component in which connections are made and broken several hundred (or thousand) times per minute, hour, or day, this technique can significantly affect how the system performs. For each technique I discuss, be sure to consider the multiplication factor—how many times your system executes this code during a period of time.
When beginning your search for ways to improve performance, consider where your application (component or Active Server Pages—ASP—code) spends most of its time waiting and processing. If you discover your application is spending a lot of time waiting for the Open or Execute method to complete, you need to take a hard look at your server-side query strategy. All data-access interfaces—including ADO—wait at the same speed for results. For example, if you have a query that takes SQL Server 20 seconds to execute, no matter which interface you use to execute the query, none will return with the result any faster than another. Although some interfaces open connections faster and some process the resultsets faster, none can affect the speed at which the engine compiles and executes a query. So, all the ADO techniques in the world might not help your performance if you're submitting "challenged" queries—for example, you haven't optimized your indexes, you haven't used stored procedures, the server is overloaded, or you're asking the application to return too many rows. No other tuning techniques will significantly help your overall performance until you solve these basic query problems. SQL Server's Query Analyzer provides a great way to analyze query performance. It can graphically display how the query is being executed and can suggest ways to improve performance.
After you're comfortable that your queries are efficient, you can use the techniques I discuss in this article to tune your ADO code even further. These tips will help you streamline the process of establishing and maintaining connections, build and submit queries that perform more quickly and efficiently, improve performance during result processing, and compare data-access strategies.
Establishing the Connection
In a client/server application, you have several ways you can hide the time it takes to establish an initial connection so that the user doesn't have to wait for the application to start while the connection is opened. First, try connecting asynchronously. With asynchronous connections, ADO starts the connection process but doesn't wait for it to complete before returning control to your application—thereby permitting your application to execute most of its initialization process and complete the form_load event more quickly. If you can close the connection and reconnect in less time than it takes for the connection pool to release your connection, your connection is virtually instantaneous. But in many cases (especially when you don't have many users), simply leaving the connection open makes the most sense. I recommend you leave the Connection object open in a middle-tier or ASP component if it makes repeated queries to the database.
Another way to improve connection performance is to avoid using ODBC with DSNs. ODBC is now in Quick Fix Engineering (QFE) mode at Microsoft, which means that the company will do no further work on ODBC or its drivers unless it discovers a major bug. ODBC DSNs are also a concern when you're managing performance and deployment. DSNs have to be installed on the client system, require a registry lookup, and can take longer to establish than OLE DB connections—especially if you hard-code the ConnectionString. In real terms, if you eliminate DSNs, your overhead savings will be minor. If you eliminate the connection altogether, you'll probably save 2 to 5 seconds per connection—assuming a connection isn't available from the pool. However, if your application has to constantly connect and reconnect, these savings can add up.
When establishing your connection, you choose the data provider. Microsoft recommends that you use OLE DB instead of the default ODBC provider. My experience shows fewer unpleasant surprises with the latest OLE DB native providers than with their older ODBC counterparts. In any case, be sure to test your application thoroughly before committing to a new provider—your code's performance, supported features, and behavior might vary.
In the middle tier and in ASP, you can't (in a practical sense) hold a connection open and still create a scalable component—at least not between invocations. Typically, a component or ASP page is loaded and discarded frequently as Microsoft IIS references and releases instances. Because the ADO-based code must establish, use, and release a connection each time the code is executed, strategies to minimize connection setup can help in clearly measurable terms. For these scenarios, the connection/session pool plays a big role in how quickly you can get connected. If you code your Command object's ConnectionString property correctly (i.e., by using the same server, initial catalog, login ID, and other parameters each time), the chances of a connection being open and available are good. If a matching connection is found in the pool, the time to connect (or reconnect) will be virtually nil (usually less than 250ms).
However, if your ADO (or VB) code doesn't release your Connection object or if you change the ConnectionString from instance to instance, OLE DB has to establish a new connection every time. If that happens, you'll soon run out of connections as the available pool dries up. To make sure the connection is freed, set the Connection object to Nothing after closing it. Also, don't use a ConnectionString in the Recordset Open method; open the Connection object independently so that the Connection object is easy to reference when you close it and set it to Nothing.
Building and Submitting a Query
The whys and why-nots of query construction is a complex area. However, some basic guidelines can help make constructing an efficient query a smoother process. Generally, use queries that don't waste the server's time. Here are several ways to build better, more efficient queries.
Don't force SQL Server to recompile and construct a query plan for your query each time it's executed. An easy way to avoid this repetition is to create and use parameter-based stored procedures. Don't bother with the ADO Command object's Prepare property—it doesn't work correctly. When creating stored procedures, you can further help ADO performance by eliminating unneeded "rows affected" values from the returned resultsets—simply add SET NOCOUNT ON to the stored procedure.
Make as few round-trips to the server as possible. If you have several related operations to perform, combine them into a stored procedure or even one multipart query that you can execute as a script on the server. Avoid using methods (such as Refresh) and improper Parameters collection references that force ADO to make unnecessary round-trips to the server.
In client/server applications, construct Command objects once, not each time they're used. Reset the Command parameter values, and re-execute the Commands as necessary.
Try to get ADO to generate a direct remote procedure call. Watch SQL Server Profiler to see when ADO asks SQL Server to run a procedure by leveraging the sp_executesql procedure. This procedure isn't particularly inefficient, but if you see SQL Server being asked to create a temporary procedure in one step, execute the procedure in another step, and drop the same procedure in a third step, you've hit a performance snag.
Whenever your query doesn't return a rowset, be sure to use the adExecuteNoRecords option to tell ADO to bypass any code required to set up and receive a rowset (in Recordset form). You can pass this option to the Execute method or as a Command option.
Don't use Command objects when you're executing simple rowset-returning stored procedures. All stored procedures (and Command objects) show up as COM methods of the Connection object. Having procedures as Connection methods can yield significant performance benefits as well as simpler code. Although this technique doesn't help with stored procedures that echo Return Status values or Output parameters, it can be helpful for action queries and other queries that return one or more rowsets. With the procedures as Connection methods, you can pass the stored procedure's input arguments as method arguments, and if a Recordset is returned, you can reference the Recordset as the last argument in the method call. For example, the following ADO statement executes a stored procedure called "Fred" that has two input parameters and a returned Recordset:
MyConnection.Fred "InputArg1", 2, myRecordset
When coding, don't expect VB to auto-complete the stored procedure or Command object name as a recognized method of the Connection object. COM doesn't resolve these until runtime.
Don't return a rowset unless you must. ADO constructs a Recordset object whenever the query being executed returns rows. Recordset objects are expensive to construct, so avoid them whenever possible. After it's built, the Recordset provides powerful features, and if you find them useful, you can use a Recordset to pass data back from SQL Server. Keep in mind that it's possible to execute queries that return results but no rows. For example, you can return integer values by using the Return Status parameter. Also, you can return up to 1000 Output parameters (from SQL Server) instead of a rowset, which would require construction of a Recordset.
Whenever possible, execute action queries (INSERT, UPDATE, and DELETE statements or stored procedures that perform these operations) instead of using updateable Recordset cursors. In this case, make sure you use the Execute method and its adExecuteNoRecords option to ensure that ADO knows that it doesn't need to construct a Recordset.
Don't ask the server to sort unless necessary. In many cases, sorting reasonably sized Recordset objects is faster after they arrive at the client. Letting the ADO client sort the rows in the Recordset also gives the client application more flexibility to use alternative user-selected sort sequences.
Know your index structure before you write your queries. You can improve fetch performance by creating appropriate indexes and properly tuning your query syntax to leverage these indexes. Use the Query Analyzer to help you decide when and whether you need more indexes.
Don't return too many rows. Too often, bulky resultsets severely limit your executable's performance. Just return the rows you need right now, and fetch more rows later if the client needs them. Limit the scope of the query with parameter-driven WHERE clauses or through judicious use of TOP N queries.
Don't return too many columns. Avoid using SELECT *, which tells SQL Server to return all columns, no matter how many exist. Just select the specific columns you need so that you won't get a nasty surprise when someone adds more columns to the tables.
Avoid using cursors. If you must indulge in cursors, don't choose a cursor that requires more resources than necessary. Don't ask for scrolling, updateability, or cached data if you don't need it.
Tell ADO what you want it to do in more detail. Be sure to include the CommandType option when opening a Recordset or building a Command object. This prevents ADO from "guessing" your intentions. You can save round-trips to the server and make your code more stable.
Also, learn to use your diagnostic tools to determine how much time code running on the server and in your application is taking—and where it's spending the time. SQL Server Profiler is an invaluable tool in this process. It can clarify what your code is asking the server to execute, and it can highlight in a flash sloppy query construction or the wrong choice of command properties. Query Analyzer can also help tune queries by diagramming how SQL Server will execute a query and suggesting ways to improve the query. Query Analyzer can even implement those suggestions (by adding or removing indexes, for example) at the click of a button.
Processing the Results
After the rows have arrived, your code can spend a considerable amount of time processing the resultset. Each architecture (client/server, middle-tier, and ASP) has techniques for optimizing code in this phase, but here are several tips that can markedly improve performance.
One of the most common mistakes I see in code is the use of late binding in referring to Recordset Field.Value items. Because your code references the Value property so often and many Field objects usually exist for you to reference, the multiplication factor plays a big role—any improvement here will yield great improvement in performance. The reason some developers use late binding is that they want to be able to visually identify which column the SELECT statement is referring to. To this end, many people use the quoted-literal technique. For example, to refer to a Field "Cows" in the Fields collection of the Recordset RS, you might use
The latter technique explicitly refers to the Value property of a specific member of the Recordset Fields collection. The technique is a little faster and will be more upwardly compatible if you migrate this code to Visual Basic.Net. Another variation on the quoted-literal technique is the use of the bang (!) operator:
Both of these techniques take considerably longer for COM to resolve than do early-binding techniques because they force COM to resolve the reference to the Value property at runtime instead of at compile time. Every reference to this object requires a similar serial lookup behind the scenes.
However, with late-binding techniques, there's no question about which column you're referring to. If you simply coded
RS(0) ' Point to the first column (Fields collection member)
COM could resolve the Value property address at compile time and the code would run faster, but only someone who knew which columns the query would return and in what order would know what this expression referred to. If the developer doesn't have control over the query source, as is often the case, this approach can be problematic. To determine which SELECT column RS(0) refers to, you'd have to figure out which SELECT statement was used to generate the Recordset and count the returned columns.
However, a couple of alternatives exist that give you both quick runtime referencing and easy human readability. One approach requires some discipline: In this case, the developer creates and publishes an enumerated list of the columns the query returns. If the query were changed to return additional columns or the columns changed in order, the developer would have to correct and redeploy the enumeration. Keeping enumerations matched to the underlying SQL queries can be somewhat tough to administer. For example, to clearly and quickly identify which columns your ADO code is referencing, you could couple the following SELECT statement with the following enumeration:
SELECT CatName, CatType, CatSize from Cats Where... Enum enuCatsQuery CatName CatType CatSize End Enum
Note that the columns the SELECT statement returns match those you specified in the enumeration. Then, when it was time to reference the Recordset Fields collection, you would execute the following code:
StrMyName = Rs(enuCatsQuery.CatName)
This way, the code is "human readable" and still binds at design time, so it's considerably faster.
But you have another way to avoid late binding. A long discussion on the [email protected] list services resulted in an approach I call prebinding, which combines two techniques. This approach won't help in cases in which you reference the Field object only once, but in client/server applications, prebinding works nicely. With this technique, you create individually named Field objects and set these objects to the Recordset object's Fields collection items. To code it, you first set up a named Field object for each Field you'll manage:
Dim fldName as ADODB.Field Dim fldType as ADODB.Field Dim fldSize as ADODB.Field
You'll incur some overhead in creating individual Field objects, but you have to weigh this one-time penalty against its benefit in dramatically improved performance.
After you've opened the Recordset, you set the named Field objects just once to refer to the desired columns in the SELECT query:
If fldName is Nothing then Set fldName = RS!CatName Set fldType = RS!CatType Set fldSize = RS!CatSize End if
You can use quoted string referencing here, or even the bang operator. Because you're using them only once, they won't make much difference in performance. Then, when it's time to reference the Field objects (post query), simply use the prebound variables:
strName = fldName strType = fldType strSize = fldSize
This prebinding approach can yield even better performance than you get with the ordinal reference (RS(0)) approach.
Client/Server, Middle-Tier, and ASP Strategies
When you start implementing code in other than client/server situations, you have another set of considerations to include in your search for performance. Some of these have nothing to do with ADO—they're related to COM. A recent Microsoft white paper states that it's faster to execute an ADO operation (Connect, query, process) in a Windows 2000 ASP page than to call a COM component to execute the same code. This finding is unsurprising; when you call an external COM component from VB (a piece of code outside your process space), the work going on behind the scenes to access the COM component and pass control to it is amazingly complex and slow. You don't have to avoid calling COM components to run your ADO code, but you can't simply encapsulate individual ADO operations into lots of little COM components and call them one at a time when you need them. Instead, try building an entire logic tree in the COM component so that in a single invocation, the COM component can perform most (if not all) of the ASP functionality in one round-trip—or as few trips between your ASP code and COM as possible. I think you'll find that running ADO (or any) code in binary (e.g., the COM component) is faster than running (ASP-) interpreted code. So, you need to find ways to avoid the expense of getting to and from the COM component.
When you can't leave a Command object or prebound Field object around to use and reuse as needed, consider techniques that avoid unnecessary object creation. In this case, calling stored procedures as Connection methods makes even more sense. Using early binding to reference Field objects can also help performance. And remember to clean up after yourself: Close the Recordset and Connection objects, and set them to Nothing when you're finished.
For best code and coder performance, remember these basics: Leverage the connection pool and asynchronous connections, reduce the number of round-trips your ADO code makes, choose an early COM-binding technique, avoid expensive and unnecessary ADO objects such as the Recordset and Command objects, and use the Return Status or Output parameters instead of rowsets. Always try to write intelligent queries and capitalize on stored procedures whenever possible. And tell ADO what you want it to do in detail to prevent it from having to guess—specify an explicit ADO CommandType, and use options such as adExecuteNoRecords.
I suggest you take this and all other programming advice with a grain of salt. The work we do, the code we write, and the systems we create are very complex, and many constantly changing factors influence them. With this in mind, if you see a technique here that looks interesting, try it out. If it works, implement it, and test it again. If it still works, great. If it doesn't, let me know what you tried and what did or didn't work.