ADO: A Data-Centric Object Model

Last spring, when I first heard that Microsoft was prepared to serve us a brand new data-access layer called ADO.NET (formerly known as ADO+), my first thought was, "It's always the same story: Redecorate the house at the end of the fiscal year. But why invest so much money for a brand new data access layer?"

As the news evolved from inside information to public information, I started receiving messages from readers and customers about ADO.NET, and most of the correspondence ran along the same lines: Is ADO.NET any better than ActiveX Data Objects (ADO)? For that matter, is ADO any better than Remote Data Objects (RDO)? And is RDO, in turn, any better than Data Access Objects (DAO)? Are these models anything more than just anagrams of one another? Let's examine ADO.NET and discuss why it really is what you need at this stage of the Windows evolution.

The Programmable Web
We're now experiencing what many identify as the third age of the Web—the age of programmability. The programmable Web consists of a set of Web applications that can display valuable information programmatically, not just interactively, through a colorful and animated UI. Before the programmable Web, Web applications were like fish swimming side-by-side in adjacent fish bowls. The fish could move about freely within their own bowls, and they could see the fish swimming in other bowls, but the glass separating them presented an impenatrable obstacle. As a user, you could interact with any Web application, but you couldn't pretend that any of them could interact with each other. In the age of the programmable Web, all the bowls will merge to form a common aquarium.

What will these developments mean for you, as a developer? Your Web applications will need to become more scalable and more interoperable. The applications must employ a multi-tiered design to become more scalable, be capable of exchanging data in a common format, and shed their original database-driven designs so that they can evolve to more modern data-centric designs.

ADO's Evolution
So far, data access has been a relatively small issue for most IT managers and consultants, whose main concern has been choosing the most cost-effective database server. However, middle-tier components become a vital necessity as soon as you need more scalable solutions. The middle tier is the logical layer where data access components typically reside. Data access components in a modern system are the only pieces of code that know about the database and connection details.

To write data access components capable of targeting any possible data source, you need a specialized data access layer. The requirements that this specialized layer must meet have changed over time. Originally, when the layer consisted of a mere COM interface built on top of ODBC drivers, it was called RDO. Through easily callable COM objects, RDO gave you access to all the entities that form the database: connection, query, command, and result set. However, RDO was completely tied to ODBC, and it became obsolete when Microsoft introduced a new data access technology called OLE DB. Aimed at providing a common programming interface for any compliant data source, OLE DB includes ODBC as a special case. However, Visual Basic (VB) and Active Server Pages (ASP) programmers soon needed a new and made-to-measure layer for data access. Enter ADO, which has evolved significantly during the past couple of years to meet the improvements in the OLE DB technology. ADO has introduced minimal support for XML, support for streams, and support for hierarchical and semi-structured blocks of data.

The Recordset Object
The main object in the ADO model is the recordset, which represents a group of records that you might have extracted from a database, fabricated manually, or obtained from a non-database OLE DB provider, such as ADSI or an Exchange provider. The Recordset object features several interesting capabilities. For example, it acts as a super-array that lets you to work with records in a connectionless scenario. You can sort, filter, bookmark, and select records. You can also use the recordset object to page between records with extremely limited code. You can modify records in memory and then apply your changes to the underlying data source using a batch update procedure.

Despite this wonderful set of capabilities, the Recordset is no longer the perfect object for today's needs. First, it's a COM object, which makes it suitable especially, if not exclusively, for the Windows platform. In an open and interoperable world, an object that can't easily and efficiently transfer from module to module and from platform to platform is simply not a good object. Second, because of the COM nature of the object, transferring a record from module to module within the Windows platform poses some serious performance issues that affect the overall structure of the application. Transferring COM objects over a network is more burdonsome than sending out strings.

An ADO recordset presents one table of data with one data source. If you want a recordset to render records from two or more tables, you must use database-level JOINs. After running the JOIN, the recordset content still consists of one collection of structurally identical records. In addition, when this model matches your requirements, sooner or later you risk running into trouble. In fact, if you're working with a table of records that results from a JOIN, ADO is no help when you want to submit changes to the original data source. This JOIN process can't happen in a "smart" way, but only through optimistic lock and copying field to field and record to record. The rub is that if the recordset comes from a JOIN, you don’t have just one table to update!

DataSets to the Rescue
ADO.NET has a slightly different object model than ADO and features a new object called the DataSet. Microsoft intends the DataSet to be a collection of data tables, which are much the same as ADO recordsets. The advantage of using a DataSet, however, is that you can now establish dynamic relationships between tables, holding two separate tables in memory while using joined data. For example, with ADO.NET, customers and invoices are two distinct tables within the same DataSet, and you use the customerID field to join the tables in a relationship. With ADO, you use a SQL JOIN command to merge the tables into one all-encompassing table. When you need to update, you can manage tables separately.

DataSets represent a data-centric, rather than a database-centric, approach that pushes the idea of disconnection from the data source as the key to improving scalability. At the same time, the DataSet programming interface doesn’t lack any of the features that you typically find in a database, including indexing, sorting, relations, constraints, scrolling, and updating. However, the ADO.NET runtime provides these database-like features; they're independent from the actual data provider capability. The basic rule is, "data is just data, no matter the source."

XML is another great chapter. A DataSet is built in XML, and any change that you make to the tables reflects immediately in the output XML stream. You can choose to navigate through the tables using the traditional sequential model, moving record after record, or using the XML object model methods. Whatever the data source, you can work against it through the XML interface and with the collection model mode of adding, modifying, and deleting rows.

However, the DataSet isn't the only way to work with data in ADO.NET. The DataReader object, which is a simpler structure than an ADO recordset or a DataSet, is a good solution when you just need a forward-only, read-only cursor.

Is ADO.NET just "yet another data access layer?" Definitely not. ADO.NET is the substrate that will form the foundation of data-handling applications that run on the .NET platform. No matter which of the various programming models you choose (e.g., WinForms, WebForms, Web services), as long as you read data from a source, ADO.NET and its rich set of objects can help—and it will always do so using the same consistent programming model.

Comments

Plain text