ADO.NET 2.0: Smarter, Faster, Better

As the November 7 release of Visual Studio 2005 and SQL Server 2005 approaches, we're seeing a flood of courses, technical articles, and marketing pieces pour out of the content providers like so many cars trying to get out of town on a Friday evening. Much of this new content unabashedly extols the virtues of ADO.NET 2.0, the new version of ADO.NET, and the tools you can use to create applications that reference it. Many of the articles simply highlight the long list of new, flashy features in ADO.NET 2.0, but I believe most developers care more about how these features will solve specific development challenges. Therefore, I'm going to introduce ADO.NET 2.0 by listing some important problems and explaining how the new release addresses them in smarter, faster, and better ways.

Smarter at Managing Connections

One of the uglier problems with ADO.NET has been how the data providers handle (or mishandle) the connection pool. For example, when a connection dies in ADO.NET 1.1, the pooling mechanism keeps the carcass around until some unsuspecting application tries to reuse it, and only then does the pooler toss out the corpse. However, the pooler leaves other dead connections for subsequent Open calls to trip over. ADO.NET 2.0 implements a new approach: Once it finds a bad connection, it flushes that connection and all other bad connections from the pool, which greatly simplifies your exception handlers. You can also add code to force the pool (or all the pools) to flush. ADO.NET 2.0 also replaces the connection pool performance counters. The new counters seem to work (the old ones clearly did not), so you can more accurately monitor your connection-pool status.

When working with the Visual Studio IDE to create a new connection in Visual Studio 2003, you have a choice of OLE DB providers—but no .NET Data Providers. To address this problem, ADO.NET 2.0 exposes a new DbConnectionStringBuilder class to help developers build real .NET Data Provider–specific connection strings. Given the ability to use new .NET Framework APIs to list providers and servers, tool builders will have an easier time building ConnectionString dialog boxes.

Some applications, especially those that have evolved from JET architectures, try (and fail) to use a single connection to post updates against rowsets that aren't fully populated. For example, developers might execute a SELECT statement by using a DataReader and, as they step though the rows, try to execute an UPDATE command through the same connection. Regardless of the wisdom (or lack thereof) of this approach, ADO.NET 2.0 and SQL Server 2005 now support Multiple Active Resultsets (MARS). Although I've yet to embrace this feature, Microsoft seems pretty proud of it. MARS lets you execute additional operations on a single connection (under guarded circumstances). I believe that simply opening an additional connection usually makes more sense, but MARS is off by default, so developers can choose whether to use it.

Handling Data Faster—For Smart and Dumb Stuff

To be fair, I accept that some developers really need to bring 100,000 rows to the client and manage the data in a DataTable. Microsoft accommodates these folks by redesigning the way ADO.NET constructs, stores, indexes, and fetches DataTable and DataSet objects. If you keep your DataTable smaller than 10,000 rows, you won't see much difference in performance. But developers who build gigantic DataTables will see a dramatic performance improvement—as much as 80 times faster (or so Microsoft tells me). This enhancement shouldn't encourage you to download the entire list of state Medicare subscribers into memory. There are better, faster ways to deal with large volumes of data—such as using the new BulkCopy class to import the rows into a server-side table and process them there by using a stored procedure or even a new CLR-based procedure.

It also seems that lots of developers use a DataReader to fetch data (because they heard it was faster) but subsequently use their own code to load the data into a DataTable or DataSet. ADO.NET 2.0's DataTable and DataSet now expose the Load method to let developers directly load a DataTable from a DataReader. You can also expose a Data-Reader stream from an existing DataSet or DataTable by using the CreateReader method. Yep, that means the DataReader might return multiple rowsets accessed through the NextResult method.

But suppose someone asks you how to make an application run faster when moving 10,000 or more rows to the client. You (like I) might casually ask, "Why?" Often, it's because the customer wants to execute a bulk operation—but doesn't know it. Generally, such customer requests take one of two forms.

The customer asks you to fetch a bazillion rows from a table, make a change, and send the rows back. Of course, some customers aren't aware of stored procedures that can usually make all the needed changes right on the server in a fraction of the time. I provide these folks with a tutorial about writing intelligent server-side UPDATE operations. But people use the tools they're comfortable with.
The customer asks you to fetch a bazillion rows from another database and move them to SQL Server. Sometimes these folks want to execute logic on the data as it goes by, but typically they just want to import a lot of rows from a mainframe in their Cleveland office. In the past, I recommended that these folks use DTS or bulk copy program (BCP) for the operation. In ADO.NET 2.0, you can easily import data from any data source by using the SqlBulkCopy class. You can bulk copy data exposed by any .NET data provider—SqlClient, OLE DB, ODBC, or any third-party provider—to SQL Server for processing. Once the data is on the server, I suggest the client alter or filter the rows in place. What takes hours or days to do with previous ADO versions takes seconds or minutes with ADO.NET 2.0.

To illustrate this last point, I recently wrote an application to use the MSDN Index CD to find a DVD in my Universal collection. Unfortunately, the Microsoft Internet Explorer (IE)–based search tool that accompanies the Index CD takes several minutes for a simple search. I uploaded the MSDN Index data by using the ADO.NET 2.0 SqlBulkCopy class to move the ADO classic Advanced Data TableGram (ADTG)-persisted .rs files from the Index CD to SQL Server. After the files loaded, I created appropriate indexes and was able to reduce the query time from several minutes to less than 5 seconds. The BCP operation moved about 450,000 rows in less than 30 seconds. I also tried to import the XML data from the CD, but the results were appalling. Why did it take so much longer? Perhaps it's because XML is one of the most inefficient (but one of the most flexible) data-storage mechanisms.

Another feature not to be overlooked is the new ADO.NET 2.0 TableAdapter. You can best generate this class by using the new drag-and-drop mechanisms in the Visual Studio 2005 IDE. The TableAdapter class is designed to expose the new functionality of the DataTable object, which now implements many of the DataSet methods. The DataTable object lets you serialize an individual DataTable by using either XML or binary serialization. Add to that new binding mechanisms and rowset navigation controls, and you have a more powerful tool to build interactive applications. ASP.NET also gets some new ADO.NET support. For example, you can now automatically update the DataSets you build and bind by using new "full-duplex" data-management technology.

ADO.NET 2.0 Can Multi-Task—Almost

Since I anticipated that my MSDN Index search application might at times take quite a while to complete a complex search, I decided to try the new asynchronous DataReader features in ADO.NET 2.0 to display a progress bar. Although this detail was easy to implement, it didn't prevent lockups during long queries because executing a query and returning the rowset is a two-phase process. When you use the SqlClient.SqlCommand's BeginExecuteReader to start execution of your search query, control returns to your application immediately. At this point, you can display a progress bar or do whatever you need to do to entertain users so that they don't press CTRL-ALT-DEL in frustration. Once the IsCompleted state is set, you can access the DataReader stream—but the waiting isn't over. You still have to return the rows, which can take considerable time. ADO.NET doesn't provide asynchronous Load or Fill methods (which would be nice), so you'll have to resort to BackgroundWorker threads. Although these threads aren't tough to figure out, leaving out asynchronous Fill and Load seems like an oversight to me. Incidentally, this new feature almost catches up with COM-based ADO—which also supports asynchronous Connection Open.

ADO.NET 2.0 Makes Smarter Updates

COM-based ADO "classic" implemented batch updates years ago, and ADO.NET 2.0 finally includes this feature. This means you'll be able to make a dozen (or a thousand) changes to a DataTable and get ADO.NET to apply those changes with far fewer—perhaps just one—round trips to the server. To support this feature, ADO.NET exposes a new set of events that provide highly granular control of the operations and the exceptions that can occur.

Unless this feature gets cut (and it's on the chopping block as I write) ADO.NET 2.0 might also improve the way the CommandBuilder constructs action commands to handle concurrency. By exposing the new ConflictOption property, ADO.NET 2.0 lets you choose CompareAllSearchableValues (the current setting in ADO.NET 1.1), CompareRowVersion (which checks the RowVersion—timestamp—for concurrency tests), or OverwriteChanges (which forces the change by eliminating the WHERE clause from the UPDATE statement). This added flexibility means the CommandBuilder can better match your concurrency management approach. I don't see that using the CommandBuilder is any smarter when you're working with complex SelectCommand syntax.

ADO.NET 2.0 and SQL Server Work More Closely

SQL Server 2005 Express is Microsoft's replacement for the MSDE version of SQL Server. One of ADO.NET's more interesting features is its ability to open a SQL Server database (especially targeted toward SQL Server Express instances) simply by pointing to a SQL Server .mdf database file. Although accessing a database through .mdf files was supported in ADO.NET 1.0, ADO.NET 2.0 now permits the ConnectionString to contain a relative path to the .MDF file and lets you use it with application configuration settings. In addition, when you use the User Instance=True option, SQL Server copies the master, model, and tempdb databases, as well as your application database, to the current user's private data area. This move greatly simplifies database access and security and prevents damage to a shared master database and other system databases.

Although ADO.NET 2.0 still doesn't support server-side cursors, it does employ technology that might be a better match for many applications. Imagine having SQL Server fire an event in your application when any row in a specific rowset changes. That's exactly what happens with the new SqlNotificationRequest class when you're working with SQL Server Notification Services. This immediate notification lets developers refresh locally cached data when server-side data changes, and this technology can work with Windows Forms or ASP.NET applications.

ADO.NET also supports all of the new SQL Server data types, including varbinary(max), BLOB, and CLOB types as well as Common Language Runtime (CLR) user-defined types (UDTs). There's even a new SqlClient interface for use in CLR-based stored procedures, functions, aggregates, and triggers.

ADO.NET 2.0 No Longer Requires the MDAC Stack

In the past, developers and architects would take a deep breath and hope for the best as a new version of the MDAC stack was announced. They knew there was a good chance that it could break some of their deployed applications. With ADO.NET 2.0, you don't have to worry (as much) about breaking existing code because you no longer need the MDAC stack (which contains the COM-based ADO and selected netlib DLLs) to deploy ADO.NET applications. When you're building .NET Framework applications, the application executables can reside in a single directory that's not affected by new code installed after deployment. You can still mark selected DLLs to be installed and registered in common areas on the target system or in the Global Assembly Cache (GAC).

It's clear that Microsoft has spent considerable time working on the new ADO.NET 2.0 features and integrating them into Visual Studio and SQL Server 2005. It's also clear that Microsoft is serious about strongly typed data, data binding, and reporting. These new features can help developers implement smarter, safer, and faster designs with less code.

Comments

Plain text