In the early days of databases, developers needed an intimate knowledge only of the database product they were using. But database products and technologies have advanced quickly. From relational databases to nonrelational data stores such as email and file systems, data-access technologies have had to keep up with rapid technology changes. And with the advent of client/server and multitier application architectures, developers must now understand a variety of data-access technologies. Most developers have spent years learning acronyms such as ODBC, DAO, RDO, OLE DB, ADO, and RDS. Now Microsoft has introduced the .NET Framework and, with it, another new data-access technology: ADO.NET.
As we become immersed in each new technology advance, we often forget how data-access technology has evolved and the rationale behind each technology. Understanding the progression of these technologies—from ODBC to ADO.NET—can help you choose the appropriate technology and optimize it for your organization.
In the most basic kind of database design, applications rely on only one database. In such a simple setup, an application developer can program directly against the database system's interfaces. Although this approach provides a quick and efficient way to access data, it often poses a big problem when a business expands and the developer must scale up the application. The one-database approach also means that each off-the-shelf application must have a different version to support each database. As businesses change, grow, and merge, applications must access multiple databases running on different platforms.
ODBC technology provides a common interface for accessing heterogeneous SQL databases. ODBC uses SQL as a standard for accessing data. Figure 1, page 30, shows a diagram of a standard ODBC architecture. This interface provides maximum interoperability: One application can access different SQL database management systems (DBMSs) through a common set of code. Thus, a developer can build and distribute a client/server application without targeting a specific DBMS.
The developer can add database drivers to link the application to the user's choice of DBMS. As Figure 1 shows, the driver manager provides the intermediate link between the application and the databases. The ODBC interface contains a set of functions that the drivers of each DBMS implement. When an application changes its DBMS, the developer simply replaces the old driver with the new DBMS's driver and the application can work as usual—without the need for code modifications.
DAO and RDO
ODBC uses low-level interfaces, so C and C++ programmers are the people who really benefit most from ODBC technology. Visual Basic (VB) programmers don't have an easy way to access the ODBC interfaces. Before VB 6.0, developers had to rely on a higher-level mode of data access. Figure 2, page 30, shows how VB programmers use Data Access Object (DAO) to access databases.
DAO is based on Microsoft Jet—the database engine for Microsoft Access. Jet was the first object-oriented interface for connecting to Access. Applications that use Access can use DAO to directly access a database. Because DAO is modeled closely after Access, using DAO is the fastest and most efficient way to connect to an Access database. DAO can also connect to non-Access databases such as SQL Server and Oracle. DAO uses ODBC, but because DAO is designed specifically to talk to the Jet engine, Jet translates calls between DAO and ODBC. This extra translation step results in slower connections when you're using databases other than Access.
To overcome this limitation, Microsoft created RDO. Figure 3, page 30, shows how RDO accesses ODBC APIs directly without needing to go through the Jet engine. Before long, Microsoft introduced ODBCDirect, an extension of DAO that makes use of RDO in the background. Figure 4, page 30, shows how ODBCDirect lets existing DAO applications access databases without the performance penalty that the Jet engine creates.
Over the years, ODBC has become the standard for client/server database access. ODBC provides a standards-based interface that requires SQL processing capabilities and is optimized for a SQL-based approach. However, what happens if you want to access data in a nonrelational data source that doesn't use SQL (e.g., Microsoft Exchange Server, which doesn't store data relationally)?
Enter OLE DB. OLE DB builds on ODBC and extends the technology to a component architecture that delivers higher-level data-access interfaces. This architecture provides consistent access to SQL, non-SQL, and unstructured data sources across the enterprise and the Internet. (In fact, for access to SQL-based data, OLE DB still uses ODBC because it's the most optimized architecture for working with SQL.) As Figure 5, page 32, shows, OLE DB consists of three components: the data consumer (e.g., an application); the data provider, which contains and exposes data; and the service component, which processes and transports data (e.g., query processors, cursor engines). OLE DB is one API that operates against SQL data sources and non-SQL data sources such as mail and directories.
OLE DB provides binding for C and C++ programmers and programmers who use other languages that contain C-style function calls. Some languages such as VB and VBScript don't provide pointer data types (address variables). Hence, they can't use C-style binding, and they can't make direct calls to OLE DB.
To add to the confusion, Microsoft introduced another data-access object model: ADO. ADO takes the objects based in DAO and RDO and provides a much simpler object model than DAO and RDO (albeit with some redundant functionality as you can now perform an operation in more than one way). The object hierarchy in ADO is much flatter than that in DAO. ADO contains several built-in objects that simplify the task of accessing data from data stores.
Figure 6, page 32, shows the many paths that applications can take to connect to databases. For example, a VB programmer can use ADO to connect an application to an OLE DB provider. If the database doesn't support OLE DB, the application can connect through ODBC. A Visual C++ (VC++) programmer can use ADO or connect directly through OLE DB.
An Example in ADO
Let's take a look at a simple example that shows how ADO works. Listing 1, page 32, shows how you might use a typical Recordset object—the central object in ADO. The Recordset object represents a set of records (much like a table) and supports cursor types such as adOpenForwardOnly, adOpenKeyset, adOpenDynamic, and adOpenStatic. The cursor can be on either the server side (which is the default) or the client side.
To access a record, ADO needs to scan a Recordset sequentially. And to access multiple tables, you need to perform a JOIN query to return the results as a Recordset. Although the Recordset object supports disconnected data access, ADO is still designed primarily for connected data access. This connected mode of access ties up valuable resources on the server side. In addition, to transmit a Recordset, you must use COM marshalling. COM marshalling is the process of converting data types, and this conversion takes extra system resources.
Starting with ADO 2.1, Microsoft added XML support to the ADO object model, which lets you save a Recordset as an XML document. However, it wasn't until ADO 2.5 that some of the restrictions and limitations of XML support in ADO 2.1 (e.g., persisting of hierarchical Recordset objects) were lifted. Although ADO can read an XML document into a Recordset, it can read only a proprietary schema known as the Advanced Data TableGram (ADTG).
In its quest to have a disconnected data-access mechanism, Microsoft extended ADO and introduced Remote Data Services. RDS is modeled after ADO and lets a Recordset be transferred to a client (e.g., a Web browser) without needing a live connection. However, RDS, like ADO, uses COM marshalling to transfer Recordsets from the server to the client.
The .NET Era
When Microsoft began designing the .NET Framework, the company took the opportunity to redesign the data-access model. Rather than extending ADO further, Microsoft decided to design a new data-access framework—but kept the acronym. Microsoft designed ADO.NET based on its experience with its successful ADO object model. But ADO.NET addresses three important needs that ADO doesn't address: providing a disconnected data-access model, which is crucial to the Web environment; providing tight integration with XML; and providing seamless integration with the .NET Framework (e.g., compatibility with the base class library's type system).
ADO.NET architecture. Figure 7 shows the ADO.NET architecture. The Recordset object, which performs so many functions in ADO, is now missing. In place of the Recordset object, ADO.NET has several dedicated objects to perform specific tasks. Table 1 describes three of these dedicated objects: DataAdapter, DataReader, and DataSet.
.NET data providers. An essential component of ADO.NET, a .NET data provider implements ADO.NET's interfaces. For example, a .NET data provider would implement the DataReader object so that either your application or the DataSet object could use it.
A data provider contains four main objects: Connection, for connecting to a data source; Command, which executes commands against a data source; DataReader, which reads data from a data source in connected read-only and forward-only mode; and DataAdapter, which reads data from a data source and uses that data to fill the DataSet object.
Visual Studio .NET includes two .NET data providers. The SQL Server .NET data provider is for connecting to SQL Server 7.0 and later databases. This access method is most efficient if you're using SQL Server 7.0 and later because the SQL Server .NET data provider communicates directly with SQL Server through the Tabular Data Stream (TDS) protocol. The OLE DB .NET data provider is for connecting to nonSQL Server databases such as Oracle or IBM DB2. This data provider uses the OLE DB provider for the respective databases.
At the time of this writing, Microsoft had just released a third data provider for .NET—the ODBC .NET Data Provider - Release Candidate Beta. You can download this data provider from the Microsoft site at http://www.microsoft .com/data/download_odbcnetrc.htm.
Figure 8 shows the various paths an application can take to connect to a database through ADO.NET. When you're choosing a path, the first factor to consider is which .NET data provider you should use. If you use SQL Server 7.0 or later, using the SQL Server .NET data provider makes sense. If the database is SQL Server 6.5 or any nonSQL Server database that comes with an OLE DB provider (e.g., Oracle), you'd use the OLE DB .NET data provider. Note that you can still use the OLE DB .NET data provider even if you're using SQL Server 7.0 and later, but you'll lose the performance benefit of communicating directly with SQL Server through TDS. However, the benefit of this nonspecific path is portability—you can interchange the databases you use without modifying code.
Next, determine the tasks you need to perform. If you simply need to read and display the data from a data source, the DataReader object is probably sufficient. But if you need to manipulate the data (perhaps you need to perform some editing and deletion), use the DataSet object. Use DataSet only when you need to because DataSet is inherently slower than DataReader. (DataSet uses DataReader to populate its table.)
An Example in ADO.NET
Let's look at ADO.NET in action in a Web service. Listing 2 shows a Web service that returns a DataSet object. The code in Listing 2 is similar to the code in Listing 1. The Web service in Listing 2 retrieves the Authors table from the Pubs database and exposes it as a Web service. The Web service uses the SQL Server .NET data provider, as the following namespace import line shows:
First, the Web service establishes a connection with the SQL Server 2000 database:
Dim conn AS New SqlConnection("server=localhost; uid=sa; password=; database=pubs")
Then, the Web service uses a Command object to execute a query against the database:
Dim comm AS New SqlCommand(sql, conn)
Next, the Web service uses a DataAdapter object to fill a DataSet:
Note that the connection closes as soon as the DataSet is filled, unlike connections in ADO, which must be open while you loop through a Recordset. The resulting DataSet returns as a Web service. Figure 9 shows a section of the DataSet that you get when you execute the new Web service. The DataSet, with its schema, is represented in XML format. A client application might choose to bind this DataSet by using a DataGrid. Listing 3 shows the code to perform this bind. Figure 10 shows the resulting DataSet bound to a DataGrid control in an ASP.NET Web application, which displays the DataSet in a more user-friendly format. ASP.NET provides many server controls that bind to DataSets automatically.
Using ADO in .NET Applications
Although ADO.NET has many powerful new features, you might want to continue to use ADO for any number of reasons. (For more information about deciding whether to convert to ADO.NET or use ADO and ADO.NET together, see Dino Esposito, "A Soft Landing to ADO.NET," page 37.) If you're developing new .NET applications, ADO.NET is the way to go. But if you're in the process of migrating your applications, you might want to retain ADO in existing projects and use ADO.NET to start new projects. The .NET Framework lets you use ADO in .NET applications through COM interop, which provides backward compatibility without requiring you to modify ADO. You need to import the ADO type library as an assembly, as Figure 11 shows. Then, you can use ADO as the sample code in Listing 4 shows.
Make an Informed Choice
Data-access technology is continually evolving. Before you've even mastered one technology, another comes along. Only one thing is certain: Databases play an increasingly important role in application development. Understanding the latest technology—and the evolutionary changes that created it—will help you choose the right technology for your current jobs and help you make informed choices as your organization's needs change.