Meditating on OLE DB and .NET
March 22, 2001
I don't know about you, but I didn't have a prompt reply ready when an existential question like, "What is software exactly?" hit me.
Consider the scene—you're in a tourist shop totally immersed in the very delicate task of buying useless things (mainly souvenirs) to make friends and relatives happy when they pick you up at the airport. In such cases, you're ritually asked a question like "First time here? Is it for business or vacation?"
So if you're somehow involved with software, and aren't on vacation, chances are you have to face that existential question.
So what is software about—exactly?
It's hard to answer existential questions, especially if you're walking around with a bag full of postcards, koala-pictured bibs, kangaroo peluches, and yellow signage warning about crocodiles.
I tried to keep my thoughts free flowing, but substantially simple. First off, software is about computers. Software is also about evolution. Certainly, software is about data and, in particular, about data storage and manipulation.
When I arrived to the hotel, my thoughts were processing the following point—what kind of evolution have I observed in recent years for data storage and usage? So I started meditating on OLE DB and its evolution in light of .NET.
The Darwinism of Software
Historically speaking, ODBC was the first serious attempt to create a uniform way in which applications access databases. As with everything else in software, ODBC was designed to meet a specific demand. It set a new stage in the never ending evolutionary process of information technology.
ODBC had to provide a common, and hopefully abstract, API to access databases irrespective of their internal details, languages, and tables organization. Over time, though, ODBC revealed to be progressively less and less adequate to successfully face the new incoming ways to design and build data-driven applications.
Since software makes no exception to Darwinism, ODBC adapted and survived with a different name, a different programming model, and new hot functionalities, but preserved its true vocation. ODBC continued to provide (more or less) open database connectivity under the name and the functions of OLE DB.
OLE DB is the programming interface that puts into practice the theoretical concepts of the Microsoft Universal Data Access (UDA) strategy. UDA provides the ability to access any type of data—relational, non-relational, and hierarchical—through a single, COM-based programming interface.
Designed as a component technology, OLE DB features a multi-layered model. On one side, you find server components that hold the data. On the other side of the COM bridge, you have client components that know how to connect and request data. The former are called OLE DB data providers; the latter are, instead, known as OLE DB consumers.
Both consumers and providers are COM objects and talk to each other through a set of COM interfaces. Such COM-based communication can be summarized in terms of actions performed on abstract objects like DataSource, Session, Command, and Rowset. So it happens that a consumer connects to a DataSource, opens a Session, issues a Command, and brings home a Rowset of data.
The Darwinistic evolution from ODBC brought UDA and OLE DB to add the ability to glue together, almost like a single relational table, all the enterprise data, irrespective of their relational, non-relational or even hierarchical nature.
The OLE DB Model
When it comes to data access, you have two basic choices. One is universalizing the data access strategy as UDA allows you to do. The other, instead, is geared towards the universalization of the data structure. It forces you to move every bit of information you may have out of its current data store and into a single, all-encompassing database server.
With OLE DB you try to glue together what your clients have today. Going the other way round, you force clients to scale up to a new, more powerful, unique DBMS, which has the ability to manipulate any format of information you need.
OLE DB is much more independent from the physical structure of the data than ODBC. In addition, it is not strictly based on SQL. OLE DB commands can be SQL statements as well as something else. In general, they can be seen as text strings written according to any syntax that the target provider can understand.
Like ODBC, OLE DB was designed with C++ in mind to maximize the performance of data access in middle-tier modules. For these same reasons, OLE DB is not directly usable from Visual Basic® or ASP.
Countless distributed systems, instead, were supposed to use Visual Basic to build components. That's the main reason why Microsoft introduced the ActiveX® Data Objects (ADO) library.
ADO has a richer programming interface than the raw OLE DB SDK. While it's definitely possible to use ADO from C++ applications, OLE DB calls pass through less layers of code and go more directly down to the data than the corresponding ADO code.
While ADO is clearly built on top of OLE DB, calls to raw OLE DB interfaces and calls issued through the ADO runtime have different relative speeds. This fact originated a sort of language-based dicotomy. What's better and more recommendable? The C++ high-performance level of OLE DB or the easier, more forgiving model of ADO in Visual Basic components?
Aside from providers and consumers, the OLE DB model also includes a third element—the OLE DB Services. A service is a COM component that processes the Rowset being returned to the consumer. It works as a sort of hook that monitors all the traffic between the consumer and the provider. ADO heavily relies on OLE DB Services to add its extended functionality like data shaping, persistence, and disconnected recordsets.
As people got serious about building distributed COM-based applications, a number of best practices have been developed for particular fields. To improve the scalability of Web applications, people turned to the data access disconnected model.
In a nutshell, the data consumer and the data provider aren't connected all the time. Once the connection is established, you issue the given query, fetch records out to an in-memory repository, and disconnect from the data source. You work offline on those records and, if needed, reconnect later and submit your changes. This model doesn't just work for everyone. Whenever it makes sense, though, it turns out to be extremely valuable in terms of scalability gains and overall performance.
A lot of systems out there have been (re)converted to employ ADO recordsets through the client-side cursor service which enables data disconnection. OLE DB has not been specifically thought of as such a model of interaction, so ADO is extended through an intermediate OLE DB service.
Thanks to the inherent flexibility of its architecture, OLE DB can be successfully used in a disconnected scenario, but it certainly doesn't represent the best way of working. Another subtle limitation of this implementation is that ADO recordsets are relied on to do so many things that there's a suspicion that they can't always do everything well. How can such an object be the fastest you can have to work both connected and disconnected, with and without XML, when fabricated or loaded from disk?
In addition, with OLE DB you have a significant lack of consistency if you consider that the bag of ADO functionality is remarkably different from the raw OLE DB SDK.
Thus ADO.NET becomes the next step in the evolutionary process of data access technologies. As the name suggests, though, ADO.NET is seemingly only the successor of ADO. What about OLE DB in .NET?
.NET Managed Providers
The timeless laws of Darwinism are now forcing the OLE DB technology to move one step forward to meet the demand of new users . In .NET, a Web application is primarily a disconnected application that makes use of freshly designed, ad-hoc tools for managing data.
The .NET Framework makes available classes to work with data. Such classes—specifically the ADO.NET and the XML namespaces—provide for collecting, reading, and writing. The ADO.NET and XML subsystems end up replacing both ADO and the OLE DB SDK so that now you have one single, and language neutral, way to getting or setting data.
ADO.NET classes abstract the data source even better than ADO because of their clearly data-centric design, opposite of the database-centric model still visible in ADO.
The .NET counterpart of OLE DB providers are called managed providers. Their role is explained in the picture below.
Figure 1. The managed provider's architectural diagram
As in OLE DB, you can recognize two interacting layers that, by assonance, I have called the managed consumer layer and the managed provider layer. To manipulate data, your .NET application doesn't need to seek out special classes or components acting as consumer modules.
A .NET application merely utilizes the DataSet or the DataReader object from the native framework and immediately becomes a "managed" data consumer. To physically fetch the data, you use instances of special classes that inherit from DataSetCommand and DBCommand. These classes represent your link to the data source.
Instead of instructing a rather generic object to go against a given provider, you simply use derived classes that already know how to cope with that given provider. So it happens that SQLDataSetCommand handles SQL Server databases and ADODataSetCommand wraps all existing OLE DB providers.
The managed provider is buried into such DataSetCommand classes. You never realize their presence and never need to specifically know about them. You use classes, set properties, and you're happy.
Under the hood, the managed provider layer in the figure above utilizes an interaction model not much different from the one in use with OLE DB, and even earlier with ODBC. The consumer command class targets a particular component that wraps a data source. It knows about the protocol that isused to read and write rows on the source. It also returns results in a format that .NET classes know perfectly how to handle.
For a better understanding, let's review the elements that concur to data retrieval in both OLE DB and .NET.
|OLE DB Providers||.NET Managed Providers|
|Identification||COM progID||Wrapped in the command class|
|Results through||Rowset or ADO Recordset||DataSet or DataReader class|
|Updates through||Provider's specific commands||Provider's specific commands|
Table 1. Comparing OLE DB and .NET data providers
The target provider is identified through its COM progID in OLE DB. In .NET, instead, such details are buried in the accessor class.
OLE DB providers always return rowsets—COM objects mainly exposing the IRowset interface. If you're accessing data through ADO, rowsets are then translated into richer and scriptable objects called recordsets.
.NET applications simply use different classes with different capabilities. The DataReader class is a simple, fast, forward-only cursor that works connected and provides access on a per-record basis. When you finish with it, you must explicitly disconnect . By contrast, the DataSet object is an in-memory, disconnected collection of tables. It is populated care of the DataSetCommand class. The content of a DataSet object is based on the XML stream the DataSetCommand class got back from the data source.
I'll be covering DataReader and DataSet classes in upcoming columns.
The data goes from the provider to the consumer in binary format, and through COM marshaling if you employ OLE DB. In .NET, instead, the managed provider returns an XML stream.
Both breeds of providers support a query language, which is normally SQL with vendor-specific proprietary extensions. Through this language, you perform updates and interrogate the data source.
So what's the difference between OLE DB and .NET data providers? Speaking abstractly, they share the same vision of data access. But managed providers are much simpler and specialized. They result in a better performance for two main reasons. First off, managed providers aren't supposed to use the COM Interop bridge to get and set data. Being COM components, OLE DB providers have no choice on this point. Furthermore, managed providers normally leverage the vendor's knowledge of the data source internals to get and set rows much faster. This is exactly also what OLE DB providers do, but when used within .NET OLE DB providers pay the price of their COM-based nature and need extra code to translate data into .NET-specific classes.
Existing Managed Providers
As of Beta 1, the .NET Framework features two managed providers: one for SQL Server (version 7.0 and higher) and one for all the data sources you can reach through an OLE DB provider.
The SQL Server managed provider is hidden behind specific classes like SQLDataReader, SQLDataSetCommand, and SQLCommand. Those classes utilize direct access to the low-level SQL Server file system. The picture below shows the provider's class diagram mapping the previous general schema to the SQL Server's managed provider.
Figure 2. Class diagram for the SQL Server managed provider
The managed provider for OLE DB plays the same role in .NET as the OLE DB provider for ODBC in Windows DNA systems. Basically, it represents the backward compatibility and the living proof that any .NET application can target any existing OLE DB-powered data source. The class diagram for the OLE DB managed provider is shown below.
Figure 3. Class diagram for the OLE DB managed provider
Notice that ADOxxx classes are expected to be renamed to OleDbxxx with Beta 2.
The OLE DB managed provider exposes .NET classes to the callers, but leverages the specified OLE DB provider to fetch rows. The communication between the .NET application and the underlying OLE DB provider (a COM object) takes place over the COM Interop bridge.
In general, in .NET you can access SQL Server 7.0 (and higher) tables through both providers. The managed provider of SQL Server goes straight to the DBMS file system to ask for data. The OLE DB managed provider, instead, relies on the services of the SQLOLEDB OLE DB provider, resulting in an extra layer of code to be traversed.
Today, if you target any data source other than SQL Server, the OLE DB managed provider is the only way you have to go. Through this same channel, you can also reach any ODBC data source.
The managed provider for OLE DB is a thin wrapper built on top of the COM Interop bridge calling into the native OLE DB provider. Aside from setting up and terminating the call, such a module also takes care of packing the returned rowset into a DataSet or an ADODataReader object for further .NET processing.
At the level of .NET code, accessing a SQL Server table through the native managed provider, or through the OLE DB provider, is essentially a matter of changing the prefix of the involved classes. Here's the code for SQL Server:
Dim strConn, strCmd As String strConn = "DATABASE=Northwind;SERVER=localhost;Integrated Security=SSPI;" strCmd = "SELECT * FROM Employees" Dim oCMD As New SQLDataSetCommand(strCmd, strConn) Dim oDS As New DataSet oCMD.FillDataSet(oDS, "EmployeesList")
And here's the code for the OLE DB provider (differences in bold):
Dim strConn, strCmd As String strConn = "Provider=SQLOLEDB;" strConn += "DATABASE=Northwind;SERVER=localhost;Integrated Security=SSPI;" strCmd = "SELECT * FROM Employees" Dim oCMD As New ADODataSetCommand(strCmd, strConn) Dim oDS As New DataSet oCMD.FillDataSet(oDS, "EmployeesList")
As you can see, on the surface the differences are very minimal; just the connection string and the command class. Employing one class or the other, instead, makes a big difference.
Existential Questions About OLE DB
.NET managed providers represent the next step in the evolution of data access technologies but, as of Beta 1, there's no documented SDK to write data-source specific managed providers. Waiting for Beta 2, a few basic questions about OLE DB and .NET cannot be skipped.
Is all the code developed for OLE DB just legacy code? What will come of all the effort that companies put (and often are still putting) into writing providers for their own data?
Have faith—OLE DB is not a dead technology. Period. It still remains a fundamental specification for a feature-rich and general-purpose, .NET-independent programming interface. It is not specific to .NET, but it is well supported.
That said, if you have custom data to expose, you cannot ignore the advent of .NET and managed providers. What then is the best interface to dress up your data providers? How should you plan to expose your data right now, for example, starting next Monday morning at 8 A.M.?
.NET utilizes open standards and is extensively based on XML. Given this, if you have proprietary, yet text-based data to expose, you can simply consider publishing it using XML, perhaps using a custom schema. There are so many facilities in .NET to work with XML data that coming up with a wrapper class shouldn't be an issue at all.
For more complex data stores, OLE DB providers still make sense because you need to reach a much larger audience that may not be bound to .NET. For .NET specific applications, a managed provider can certainly give substantial performance advantages, but I would be very careful in this case as well—especially this Monday! Don't forget, no SDK for managed providers has been released yet, but Microsoft is committed to delivering one.
So summarizing, the next data provider I'm supposed to start writing this Monday morning will consist of a pair OLE DB providers plus a .NET wrapper class talking in XML. My first option wouldn't be to have the .NET class wrapped around the OLE DB provider through COM Interop. I'd rather employ the same, somehow adapted, source code. In this case, Managed C++ is probably the best language to use to facilitate the "physical" code reuse.
The Long Run of OLE DB
Take this as a sort of a forecast to be verified a couple of years from now. In the long run, I hazard to say that OLE DB will come to the same, relatively bad, end of SGML—the Standard Generalized Markup Language, forerunner of XML.
Introduced as the savior of the world of data exchange, SGML never became a de-facto standard, probably because it was too powerful and complex for everyday use. Fact is, its inspiring principles have been widely accepted only after being properly narrowed and specialized to originate XML.
My forecast is that as soon as .NET takes root, OLE DB will progressively lose importance until it disappears. I can't say how long this process will actually take, but I would bet on it.
Quotes available soon <g>. Stay tuned.
Dialog Box: Yes, I Feel Legacy Now
You probably live in another time dimension! How can you define legacy code as existing ADO code, in most cases written six months ago or later? How can you do that in the name of a technology/platform, .NET, that is not even at the second stage of its beta program?
What would life be without a bit of emphasis? Good point, anyway.
Can we really define legacy code as all the sharp rocks of ADO code found in many recent DNA systems? My answer is still "yes, we should." But I do understand that it definitely sounds baffling.
I consider "legacy code" the code that is no longer aligned with the core of the hosting platform. Believe me, this is exactly what is going to happen with .NET. Of course, there will be ways to integrate existing code, components and, applications in .NET.
.NET is a non-violent revolution that in the next few years will absorb any living instance of software in Windows. Resistance is futile—you'll be assimilated. Regardless of the age of the code, in my definition of what's legacy it's the alignment between sources and runtime that really matters.
.NET changes the Windows runtime making it managed. While COM and the Windows SDK are not dead, you have to write code according to another model. Regardless of the underpinning of this new runtime, there's a brand new model to cope with. And this model will be the future model of Windows.
Windows is not dead, but it'll change. COM is not dead, but it'll have the face of .NET classes. ADO is not dead and it still works, but the .NET features of ADO.NET are the future of ADO.
.NET is not simply Windows 6.0 and ADO.NET is not a fancy new name for what could have been called ADO 3.0. It's different and pervasive. It's a new platform. All the rest is either another platform or, when integrated, is legacy code.
Legacy code has no age. I understand that people will be writing DNA systems this week, six months from now, and even when .NET actually ships. I'm not saying this is necessarily wrong or should be absolutely avoided. Simply be aware that you're swimming against the stream.
Dino Esposito is Wintellect's ADO.NET expert and a trainer and consultant based in Rome, Italy. Dino is a contributing editor to MSDN Magazine and writes the Cutting Edge column. He also regularly contributes to Developer Network Journal and MSDN News. Dino is the author of the upcoming Building Web Solutions with ASP.NET and ADO.NET from Microsoft Press, and the cofounder of http://www.vb2themax.com/. You can reach Dino at firstname.lastname@example.org.