By Kraig Brockschmidt and Lorenz Prem, Program Managers, Microsoft Corporation
Last updated September 2010
Over the last twenty years, Microsoft has been developing ever-more powerful and flexible data access solutions. Some are very specialized, others are general-purpose, but all of them share two common goals: to help applications get at the information they need and to help developers spend less time dealing with intricate details of data storage and more time creating sophisticated software that utilizes that data for real benefits to their customers.
In this historical overview, you’ll quickly see that there are as many different data development technologies as there are different types of data sources. Some technologies have been retired, but most of them are still active and more keep being developed. Why is that? Why does Microsoft keep creating additional ways to do essentially the same thing?
The most central reason really has nothing to with the particulars of accessing or manipulating data, nor even the particulars of databases. Fundamentally, it has to do with the fact that the most long-lasting element in nearly all computer systems is data. Not the hardware, not the design architectures, not the database technology nor data access APIs, and certainly not the applications that are built on data access technologies.
No, data outlives them all, because it’s data that essentially holds the facts of our existence—as people, organizations, and nations—and as such those facts will remain facts forever. Sure, such data gets migrated from one system to another (and another, and another). And maybe it gets updated on occasion, as when someone is born, when someone changes their name, or when we discover that we really should have used four digits to store a year rather than two. But so far as computer systems are concerned, data is effectively immortal, and everything else is but a wrapper. Systems may be upgraded or fall into obsolescence, and perhaps one database vendor makes a better sales pitch than the competition. Whatever the case, data gets moved around all the time…and yet the data itself stays the same and is still there waiting to be accessed and put to good use.
Every layer, then, that is built up around the data is increasingly more ephemeral—that is, shorter-lived—than those closer in. Data has more longevity than the database engine, which in turn generally outlives data access technologies. As you read this article, for example, note the constancy of the Microsoft SQL Server box in the illustrations: it stays where it is in the lower right corner, outliving its original data access technologies as it will likely outlive current ones and even those that are just now in development.
Of course, those data access technologies in turn outlive specific applications built on top of them. Another way of saying this is that specific data access technologies serve a particular means of building applications—such as unmanaged Win32 code or managed .NET code—or a particulate type of application—such as rich client, web application, or server application. So they naturally have a longer lifespan than any given application (just as applications have a longer lifespan than the forms they present to the user which easily change from version to version).
Data access technologies come, and occasionally go, simply because they each accommodate the development needs around different kinds of applications and different kinds of data stores. This is why there are technologies for managed (.NET) and unmanaged (Win32) code, and technologies that accommodate applications wanting to be database-agnostic as well as those wanting to optimize for a particular source. It's really just a matter of finding which method best fits your particular requirements. (For a concise selection guide across all presently-supported technologies, see the Learn page on the Data Developer Center.)
Along with the data access technologies support services started appearing. Every company, it seems, requires reporting, analysis, and other services to run on the data contain in the databases deployed throughout the enterprise. At first these support features did not exist, or existed in a very limited fashion. As time progressed they improved. Along with them the value that could be derived from data stored in database grew. The data will outlive these services as well, but these services will continue to add more and more value to the database model.
A Brief Nod to Microsoft Access
In the early 1990s, Microsoft had two primary database offerings: SQL Server and Access. Microsoft Access, for its part, was and remains a fabulous combination of a database engine and a front-end design environment, allowing rapid development of complete database applications without traditional coding practices. But that’s as much as we’ll say about Access here, because with a solution like Access you really don’t need to concern yourself with the mechanics of talking to a database. Access does that under the covers. What’s more, as a desktop client solution, Access only targets a portion of the database applications people want to create; many applications do need to concern themselves with getting to the data directly.
This was true with SQL Server from the beginning, which in itself was all about high-performance data access suitable for large numbers of concurrent users. Its focus was on serving up the data to whatever applications wanted to get at it, however those applications were written. In that sense, little has changed over the years except that the landscape of data sources is much more expansive, the types of database applications are much more varied, and the means of the latter talking to the former have been constantly built up, just as a small community with a couple of houses grows and matures into a full-fledged township. And it’s this particular story of growth that puts all of Microsoft’s data development technologies into context.
The Foundations of Data Development: Win32 ("Native") Technologies
When Microsoft first introduced SQL Server 1.0 in 1989, there was a single programmatic or “call-level” API called DB-Library. Through its 150 functions, an MS-DOS or OS/2 console application or a Windows or OS/2 GUI application could party on data with the create-retrieve-update-delete operations we still know and love today (ironically referred to with the wonderful acronym of CRUD).
Also available was Embedded SQL for C (ESQL for C in its short form), a precompiler that allowed SQL statements directly in source code, a simple foreshadowing of what we’ll see later in LINQ. (If you’re interested, a basic piece of ESQL for C code can be found in the SQL Server 2000 documentation.)
At this point in time, other databases had their own proprietary APIs like DB-Library, but all of these were completely separate as shown in Figure 1.
Figure 1: Data access technologies circa 1990; Microsoft’s offerings were only those for SQL Server.
In September 1992, Microsoft Released the Open Database Connectivity specification or ODBC. ODBC, still in widespread use, is a 50-function call-level API that communicates through drivers to any number of underlying databases or database-like stores. ODBC builds a data access abstraction on top of proprietary APIs, giving applications a single means to get to a wide variety of data sources from the oldest legacy dinosaur to the latest cutting-edge technology. Database vendors, if they choose, can also supply a native (that is, no-middlemen) ODBC driver to achieve better performance (as Microsoft does with SQL Server).
Applications, of course, are still free to use proprietary APIs; ODBC simply provided a way for applications to isolate themselves from such APIs, as when those applications needed to operate against multiple databases from different vendors.
Figure 2: Data access technologies in September 1992.
A few years later, as object technologies and programming languages (such as COM, C++, and Visual Basic) came into the mainstream, new object-oriented access layers were created. First came Data Access Objects (DAO) in Visual Basic 3 (November 1992). This was later replaced in Visual Basic version 4 August 1995) with Remote Data Objects (RDO), a VB-compatible object layer on top of ODBC. A year later in August 1996, Microsoft released the much more generalized OLE DB technology to create an object-oriented access layer alongside ODBC, working again on a data store-specific provider model.
OLE DB, which is also still in widespread use, doesn’t raise the level of abstraction for data access above that of ODBC; it simply provides a programming model that's suitable to certain development styles. Still, at the time it was released it also helped open up even more types of data stores, specifically those that could be represented as tabular data (such as spreadsheets and text files). In that way OLE DB was an advance from OBDC.
Figure 3: Data access technologies in August 1996.
The result was ActiveX Data Object, or ADO, a higher-level object abstraction built on top of OLE DB, available to pointer-capable and pointerless programming languages alike. ADO was first released in October 1996 (and had thus obviously been in development for some time along with OLE DB).
Figure 4: Data Access in October 1996.
There were then six different Microsoft APIs for data access depending on how an application was written and the kinds of data stores it wanted to access:
SQL Server DBs
DBs w/ ODBC driver
DBs w/ OLE DB driver
Apps written in C/C++
OLD DB, ADO
Apps written in VB
Since that time, many of these technologies have lived on as dependable workhorses that continue to support the many kinds of applications that developers need to create. Some improvements have been introduced, and some older technologies have been retired as newer ones have come in to fill the same needs. Support for DB-Library ended with SQL Server 2000, for example, as a new and better direct API, SQL Server Native Client, was introduced with SQL Server 2005. Similarly, the development of Language-Integrated Query (LINQ) ended the long run of ESQL for C with SQL Server 2008.
RDO, similarly, was supported through Visual Basic version 6, after which Visual Basic became Visual Basic .NET and the focus shifted to ADO.NET (as we'll see in the next section).
Indeed, the introduction of the .NET Framework (version 1.0 in February 2002 and 1.1 in April 2003), has shifted the primary focus of most new development efforts into the managed-code space. Thus while the ODBC, OLE DB, and ADO technologies continue to age, they remain the primary data access technologies for unmanaged code (i.e. Win32) applications. Today they are collectively known as the Microsoft Data Access Components (MDAC) or the Windows Data Access Components (WDAC), and are part of the Windows SDK.
There is also some continued evolution in these technologies, such as ADO Multi-Dimensional (ADOMD), ADO Extensions for DDL and Security (ADOX), the SQL Server Java Database Connectivity (JDBC) driver, and the PHP driver for SQL Server. See Figure 5 and the table that follows it. In short, with or without the .NET Framework, there is still an ongoing need for direct, optimized access to data stores like SQL Server from unmanaged code environments.
For more information on these, visit the links above as well as the SQL Server Developer's Center and the Data Access Technologies Road Map [up to ADO.NET]. A similar article to the latter is Data Access Technologies, which again is primarily focused on these Win32 technologies.
And with that we’ll leave the unmanaged code realm and shift our focus as well to developments in the kingdom of .NET.
Figure 5: Data Access Technologies for Unmanaged Code, current to September 2010.
Current Choice Matrix
SQL Server DBs
DBs w/ ODBC driver
DBs w/ OLE DB driver
Apps written in C/C++
ODBC, OLE DB, ADO,
OLD DB, ADO
ADO, PHP Driver,
Accessible through the Microsoft OLE DB Provider for ODBC (MSDASQL)
Data Development Today, Part 1: Direct Data Access
The introduction of the .NET Framework in all its glory might have changed the way applications were developed and the tools used to develop them, but it did not change the fundamental need for applications to access data. Indeed, the introduction of the .NET Framework and the development of .NET data access technologies further demonstrate the underlying longevity of data and of the database technologies used to store that data, neither of which changes with .NET.
According to its basic design, the .NET solution to any programming need is to supply appropriate .NET classes with the desired APIs. In the case of data access, primary classes include SqlConnection, SqlCommand, DataReader, DataSet, and DataTable (along with server controls for ASP.NET applications). Such classes together compose the core of ADO.NET.
ADO.NET is basically the .NET sibling to ADO, exposing a similar conceptual model while also offering expanded capabilities. For a comparison of the two technologies, see ADO.NET for the ADO Programmer. (Even if your only knowledge of ADO comes from this present article, the comparison still provides an excellent overview of ADO.NET in its own right.)
Like its predecessor technologies, ADO.NET works through an abstraction layer that hides the details of the underlying storage technology. That layer is provided by ADO.NET “data providers,” and Microsoft supplies a native provider for SQL Server as well as providers for OLEDB, and ODBC (see .NET Framework Data Providers in the .NET 1.1 documentation). What’s more, ADO.NET supports transparent access to XML sources in the same manner (something ADO did not do). These relationships are illustrated in Figure 6. (Note: for a more comprehensive version of this diagram, see the .NET Data Access Architecture Guide article. Also note that while the DataSet class is part of the .NET System.Data assembly, DataReader and DataAdapter are implemented by providers, as are provider-specific classes like SqlConnection and SqlCommand.)
Figure 6: The introduction of ADO.NET in 2002/2003.
This view of data access through ADO.NET still remains current today, and the majority of .NET applications are actively using the DataSet class (and related classes like DataAdapter and SqlCommand) to talk to their underlying data stores. (For details on the differences between ADO.NET in the .NET frameworks version 1 and 2.0, refer to What’s New in ADO.NET [.NET Framework 2.0]).
Now it’s important to point out, as a prelude to the latest technologies, that the basic data access programming (that is, querying) model in ADO.NET is pretty much the same as we’ve known since the beginning of this history:
- Open a database connection
- Execute a query in the database
- Get back a set of results
- Process those results
- Release the results
- Close the connection
(As an example, compare the structure of the basic DB-Library code in the SQL Server 2000 documentation and that of the basic ADO.NET code available in the All-In-One Code Framework, specifically the project called CSUseADONET and the code path through the SelectUsingUntypedDataSet method).
In other words, while the APIs and other details have changed throughout the years, the essential structure of the code has remained the same. This is not a bad thing, because this programming model clearly matches the basic purpose of databases: to ask questions and get back answers. It would be a safe bet, in fact, that whenever a new general-purpose development environment comes along (on the order of the .NET Framework), there will be new data access technologies that follows this model. In the meantime, ADO.NET will continue to be a core means of data access.
But there is still plenty of interesting things to do where data access is concerned. In fact, two of the most interesting technologies, Language-Integrated Query (LINQ) and the ADO.NET Entity Framework, were introduced with the .NET Framework version 3.5 and version 3.5 SP1, respectively. In many ways these technologies are much more significant strides than the transition from ADO to ADO.NET.
Introduced in November 2007, LINQ keeps alive the ghost of ESQL for C: the idea of embedding queries directly within the structure of a programming language itself. As evidenced in its overview, LINQ is a far, far better solution than its distant ancestor in these ways:
- LINQ can be used from any .NET language that supports it (like C# and Visual Basic)
- LINQ allows for many modern performance optimizations
- LINQ can operate against (that is, query and update) arbitrary data sources
Those data sources can be anything from in-memory CLR objects to XML to databases to ADO.NET DataSets to any other source upon which someone happens to implement a .NET class with the IEnumerable or IQueryable interfaces. These are shown in Figure 7.
Figure 7: The introduction of LINQ in November 2007 including LINQ to
LINQ to XML, LINQ to DataSet, and LINQ to SQL.
It shouldn’t be surprising, then, that there are not only a healthy handful of “LINQ to XYZ” implementations that Microsoft provides, but there are literally dozens of third-party implementations including things like LINQ to Flickr, LINQ to Amazon, LINQ to CSV, LINQ to MAPI, and LINQ to Twitter. In short, if something can look like a queryable data store, you can put an IQueryable interface on it and make it work with LINQ.
The other recent development in data access takes even another step beyond LINQ. That is, while LINQ introduces a remarkably simple yet powerful query mechanism, especially in working with DataSet and SQL Server databases, it doesn’t change the essential nature of how the application sees relational data. Every technology for relational data that we’ve seen so far, in fact—DB-Library, ESQL for C, OBDC, OLEDB, ADO, ADO.NET, and LINQ—are all united in that they inherently work directly against the relational structure of database tables.
This is all well and good for simple databases and simple applications. However, as a database gets richer and an application more sophisticated, we see increasing divergence between the way an application needs to look at the data—its “conceptual” or “object” model—and the way information is structured in the database—the “storage” or “relational” model.” A manufacturer's ordering system, for example, may store its order information in multiple related tables, yet the application programmer really wants to work with a single, conceptual “order” entity without having to do a complex JOIN in every order-related query. (This is a case of what’s called the “object-relational impedance mismatch,” if you’ve heard that term.)
For this reason, programmers have routinely implemented their own mapping layers to create those singular, conceptual entities. Such a layer supplies a single “Order” object, for example, with methods like Retrieve and Update that internally perform the necessary database-level queries to shuttle information between the Order object and the underlying tables.
Mapping layers, in short, conveniently isolate the application from the specifics of the logical database structures. It’s a very common pattern that one can find in projects written from DB Library to ADO.NET and LINQ to SQL. What’s not so convenient is that these layers are tedious to implement and maintain.
That changes with the Entity Framework introduced in August 2008 with the .NET Framework version 3.5 SP1 (see overview) and greatly improved with the .NET Framework 4 in April 2010. The Entity Framework, as it’s simply called, automatically generates a mapping layer of .NET classes from an existing database, along with a concise textual description of the mapping between conceptual and relational levels, which you can customize however you like. Whatever the case, the Entity Framework provides full LINQ access to the resulting entities (LINQ to Entities), and as it’s built on top of ADO.NET the Entity Framework directly leverages the ADO.NET providers.
That conceptual and relational representations can be easily created using designers in Visual Studio, and the Entity Framework designer in Visual Studio will create a default mapping without any effort on your part. In other words, don't think that working with the Entity Framework introduces more complexity than other solutions—it is, in fact, easier than everything else for core scenarios and offers much more flexibility beyond the basics. And the bottom line of it all is that you can focus your efforts on what you really care about—the conceptual objects and mapping—and not labor over all the tedious, error-prone plumbing code that’s traditionally left to summer interns.
This brings us to the current state of the art as of September 2010 as shown in Figure 8.
Figure 8: The Entity Framework, first released in August 2008, automates the hard work of conceptual mapping. An Entity Data Model is used at compile time to generate classes for a mapping layer.
Data Development Today, Part 2: Data in the Cloud
As subtly betrayed by the title of the last section, everything we’ve seen to this point applies primarily to applications that can access a database in some direct manner, such as the file system, named pipes, etc. This applies to client applications accessing databases on an intranet as well as web applications accessing backing stores on their web servers (and web applications written in ASP.NET have full access to ADO.NET and Entity Framework, of course).
What we haven’t talked about are applications that access data more indirectly, such as web or rich internet applications that wish to create and consume REST-based data services.
To be honest, this subject starts to bridge us into the world of the Windows Communication Foundation (WCF) and the world of web services in general, which goes beyond the scope of this article. We mention it, however, to introduce WCF Data Services (formerly ADO.NET Data Services and code name “Astoria”), a framework that facilitates this communication behind the scenes of what we’ve been exploring so far (see Figure 9).
Figure 9: WCF Data Services facilitates creating and consuming
REST-based data services.
The goal of Data Services is to facilitate near-turnkey creation of flexible data services that are naturally integrated with the web, using URIs to point to pieces of data (such as entities in an Entity Data Model) and simple, well-known formats to represent that data (such as JSON and XML). Other web applications (and agents) can interact with the resulting REST-style resource collection using the usual HTTP verbs such as GET, POST, or DELETE. Indeed, so universal are the conventions for working with data services that they have been formalized as the Open Data Protocol, information on which can be found at www.odata.org.
Many of the Microsoft cloud data services, such as Windows Azure tables, SQL Data Services, and SharePoint, expose data using the Data Services protocol. This means that the Data Services client libraries and developer tools can be used to work with both on-premise and hosted cloud services.
Data Development Futures
If we take a step back from the whole arc of the discussion we’ve had in this article, we can see a definite trend toward greater levels of abstraction. The earliest data access solutions for SQL Server, like DB-Library and ESQL for C, were sufficient though quite rudimentary. Technologies like ODBC created an abstraction layer above the proprietary APIs of different databases, a model that OLE DB, ADO, and ADO.NET continue to follow.
Most recently, the Entity Framework has gone a step further to create an additional abstraction layer not over the data access API but over the structure of a relational database itself. Similarly, Data Services transform any number of diverse data sources into something accessible through a simple REST-based exchange protocol, the Open Data Protocol or OData (see http://www.odata.org). In fact, Microsoft expects that using such protocols will become increasingly popular, as it allows data providers and consumers to evolve independently from their programming model.
Microsoft is continuing to evolve both the Entity Framework and WCF Data Services and the Open Data Protocol. These are the areas that will see continued investment. For more information, see Entity Framework futures and Data Services futures.
Application Services in Microsoft SQL Server
Soon after the launch of the first database it became apparent that users wanted to do more with the data than just read and write it. Data has the wonderful property of begin useful in many scenarios besides the one for which it was originally created. Users wanted to gather it, transform it, analyze it, and make decisions based on it. To address these needs Microsoft started development on many value-add application services based on SQL Server.
To begin with, Data Transformation Services (DTS) was introduced with SQL Server 7.0 in 1998 as an application model for data transformations that run on the database server. Developers could define application packages that contain data transformation logic. Packages were triggered by events in the database. DTS was viewed as a significant improvement to the client-application-only model that existed before. DTS is the right choice for workflow like apps that are triggered by database changes.
As shown in Figure 11, SQL Server 7.0 also included another new data service, the Online Analytical Processing (OLAP) services. OLAP analysis is a method for answering multidimensional queries, which are an important tool in data analysis. The OLAP service marks the beginning of business intelligence based on data stored in a SQL Server at Microsoft (and is now known as SQL Server Analysis Services as noted below).
Figure 11: The first introduction of support services for SQL Server in SQL Server 7.0 (1998)
SQL Server 2000 saw the addition of SQL Server Replication Services (Figure 12). This featured made it possible to take data stored in one database and replicate it in another. The database moved one step closer to becoming the source of all data in the enterprise. Since the 2000 release SQL Server can import and export data from and to other databases. In most cases the data can be kept in a consistent state automatically. Developers should note that managing data consistency is best achieved using the replication services instead of doing the work at the app level.
In SQL Server 2000, OLAP was also renamed to SQL Server Analysis Services, reflecting the addition of data mining and other features that go far beyond the definition of OLAP. The data gathered by the service can be accessed programmatically. The SSAS API exposes essential data analysis features.
In 2000 the database also extended to client devices. Windows Compact Edition (CE) was released along with a lightweight database engine. Today this engine is known as SQL Server Compact Edition. The database runs on lightweight client devices such as cell phones and pocket PCs. Mobile application developers can access the benefits of the database programming model along with its ecosystem of value-added services.
Figure 12: Replication Services and SQL Server Compact Edition were introduced with SQL Server 2000. Analysis Services also absorbed OLAP and added new features such as data mining, leading to the new name.
In 2004 SQL Server Reporting Services (SSRS) was released as an add-on for SQL Server 2000. Companies demand reporting capabilities on the data and data-related events stored in their databases. Design as a business intelligence tool, SSRS bridged the gap from data created by operations to data used for decision making. The SSRS API and report language are the fastest way to generate and execute meaningful reports. The developer can spend their time thinking about what goes into a report. Pretty printing and data gathering are handled by the service.
Figure 13: The introduction of SQL Server Reporting Services in 2004.
In our minds data is something we think about in ethereal terms. Conceptually it exists far removed from the place it is stored. It is pure; never inconsistent or out-of-date. In the physical world of hardware, however, achieving the goal of ubiquitous data is not easy to solve. Starting in SQL Server 2005 several new services were created to isolate the application from having to worry about storage concerns. While we have not truly reach the state of ubiquitous data yet, SQL Server has made and continues to make big steps forward.
SQL Server 2005 saw the re-launch of DTS as SQL Server Integration Services (SSIS). The new paradigm of data warehousing required data to be extracted, transformed, and loaded (ETL) into ever expanding databases. While still retaining the capabilities of its predecessor, SSIS added new transformation capabilities and increased performance. As before, SSIS is the application model for workflow applications close to the database.
As shown in Figure 14, Microsoft SharePoint entered the picture with SQL Server 2005. Since day one the data contained in SharePoint wiki sites was stored in SQL Server Databases, so SharePoint data is naturally integrated. That is, since 2005 this data is directly accessible through Reporting and Analysis Services. This integration makes the vast amount of data created every day by employees on wiki sites available to the management features. To the developer this means that SharePoint data becomes accessible using the SQL programming model. A properly set up system can speed up programming SharePoint related applications.
SQL Server’s success also brought new issues. Applications became too popular for a single database to handle all the traffic. SQL Server Broker elevated the programming model from authoring single applications to development of entire systems. The framework allows producers and consumers to collaborate in a low-latency, high-throughput system. Scale is achieved through load-balancing between multiple instances of a component in the system.
Figure 14: DTS was changed to SQL Server Integration Services in SQL Server 2005, which also included the addition of the Broker Service and natural integration with SharePoint databases.
Since 2005, the industry has witnessed tremendous expansion in the mobile device market. The number of devices producing and consuming data has been simply exploding.
It is very tempting for a mobile-phone app developer to think of data as accessible and current regardless of the device’s location and state. For this reason, the Microsoft Sync Framework introduced in 2008 addressed the data needs underlying this scenario. It allows data to be synchronized across any number of devices using a number of different synchronization and conflict resolution algorithms.
SQL Server 2008 also added the SQL Server Master Data Service. The service builds on the idea that some data in the enterprise is authoritative. This data is accessible through a single trusted source, the Master Data Service, and is known to be correct at all times. The app can use the same data access technologies to get at this data that are available for the database.
In 2009 the database also moved to the cloud with the introduction of SQL Azure. SQL Azure hosted the first customer database in Microsoft-operated datacenters. The cloud database model removes hardware management from the list of concerns for the database operator. Developers can now truly think of the database as an unlimited source of data residing somewhere in the ether. Microsoft takes over most of the concerns associated with operating a database server.
Thus the current view of support services in SQL Server is shown in Figure 15.
Figure 15: SQL Server and SQL Azure with support services as of September 2010.