New information has been added to this article since publication.
Refer to the Editor's Update below.

Flex Your Data

Teach Old Data New Tricks with the Fully Loaded Advances in ADO.NET 2.0

Julia Lerman

This article is based on a prerelease version of the .NET Framework 2.0. All information herein is subject to change.

This article discusses:

  • ADO.NET 2.0 classes that will simplify coding
  • Faster reading and writing of data
  • Interacting with SQL Server 2005
  • Threading, reusing connections, and other optimizations
This article uses the following technologies:
.NET Framework, ADO.NET, SQL Server

Contents

The DataTable as a First-Class Citizen
IDataReader
DataTableReader
Batch Updates
Binary Remoted DataSets
RowState
The ProviderFactory
SQL Server Updates
Threading, Notifications, and Bulk Copy

The Microsoft® .NET Framework 2.0 delivers an updated ADO.NET that will streamline your data access development efforts. The ADO.NET team has worked with both the System.Xml team and the SQL Server™ team to provide a consistently rich developer experience, crossing technology boundaries from ADO.NET to XML and back. This will all be apparent as you dig into ADO.NET 2.0. There are improvements on so many fronts that you will certainly be pleased.

Rather than attempt to list every new and updated feature of ADO.NET 2.0, I will discuss in detail some of the more interesting improvements and focus on performance and flexibility.

The DataTable as a First-Class Citizen

In the .NET Framework 1.x, DataTables did not support XML serialization on their own. Specifically, the DataTable did not support the ReadXml and WriteXml methods native to the DataSet. In addition to leveraging these methods to persist data, the ability to send a DataSet to a Web service (or return one from a Web service) is quite convenient; however, you could not do this with a DataTable. And because Microsoft chose not to document and publicly support the IXmlSerializable interface in the .NET Framework 1.x, it's not clear in the documentation that DataSet inherits that interface. With the .NET Framework 2.0, not only are you encouraged to use the IXmlSerializable interface, but System.DataTable class now implements IXmlSerializable as well. This will make life simpler because you will not be forced to use a DataSet when all you need is a table. ReadXml, WriteXml, and GetSchema, previously unique to the DataSet, are now part of the DataTable.

Because of this, you can now pass DataTables to and from .NET-based Web services. Although these changes will not make earth-shattering differences in your code, working with DataTables will be much easier. It will also alleviate some confusion regarding when to use a DataTable versus a DataSet. Interestingly, if you were to compare the resulting XML from a persisted DataTable to the XML of a persisted DataSet that contains only that one DataTable, you would see that the XML is quite similar, as both work directly with the DataTable.

Another DataSet method that DataTable now supports is the Merge method. The DataTable.Merge method allows you to pull in another DataTable (with the same PreserveChanges and MissingSchemaAction from the .NET Framework 1.x) but you can't bring in a DataRow collection like you can with the DataSet. If a DataRow collection contains rows from separate DataTables, the DataSet.Merge method can combine the various rows and create a new DataTable when necessary. But if that same DataRow collection was used with DataTable.Merge, there would be no way to handle the rows from other DataTables. Therefore, to avoid problems, the overload is not supported by DataTable.

IDataReader

[Editor's Update - 3/15/2005: ASP.NET data-bound controls can bind to data sources that support IListSource or IEnumerable. While the IDataReader interface itself does not derive from either IListSource or IEnumerable, all of the .NET Framework classes that implement IDataReader also implement IEnumerable, including SqlDataReader. Thus, while you can't count on being able to bind an arbitrary IDataReader implementation to a control like the DataGrid, you can in the case of SqlDataReader.] In the .NET Framework 1.x, this means you have to create a DataTable or DataSet in the data access layer and then pass it back to the UI if you want to perform data binding.

In ADO.NET 2.0, Microsoft has responded to customer feedback and exposed the functionality behind Fill so that you can load data into a DataSet or DataTable without a DataAdapter using the Load method. With Load, the process of populating a DataTable or DataSet is no longer explicitly tied to a DataAdapter and therefore no longer requires a connection to the database in some scenarios. This means you can return the more efficient DataReader from your data layer and still benefit from the functionality of the DataTable and DataSet, which you can create in the layer where you need to use it. You now have better control over where to expend your resources. As always, it's important to remember that a DataReader maintains a connection to your database and you must close that connection, especially if you are moving it out of the data layer. Load also works with a new disconnected relative of the DataReader, DataTableReader, which I discuss later.

The Load method also has a LoadOption parameter which lets you choose the merging mode if there are already rows in the object you are loading into. These options are OverWriteRow, PreserveCurrentValues, and UpdateCurrentValues. The Beta 1 documentation explains these options in great detail. Microsoft has made merging data much easier with this functionality. In fact, these same options have been added to the DataSet.Merge method and the new DataTable.Merge method, which I previously discussed.

DataTableReader

The DataTableReader is a new class that inherits from another new base class, DbDataReader (a nondatabase-specific implementation of an IDataReader). DataTable and DataSet both have the CreateDataReader method (called GetDataReader in earlier releases of the Beta), which will return a DataTableReader with the same number of resultsets that is represented by the container. Deleted rows will not be pushed into the DataTableReader. You can use the DataTableReader to iterate through data, to stream data elsewhere, and to load data into a DataSet or DataTable.

DataTableReader is a great way to iterate through a DataSet or DataTable without the overhead of those larger classes. This is advantageous because if any rows are deleted from the source DataTable while you are in the middle of iterating, the record pointer will remain in its correct position. This is quite different from SqlDataReader, which puts a reference constraint on the source data, preventing deletions. Additionally, if new rows are added ahead of the DataTableReader's record pointer, they will be available as you continue to iterate through your DataTable.

If you inspect this object in the debugger you'll note that although you can't see the Tables property of a DataTableReader in code, you can see it in the debugger. The schema of the DataTable(s) are absorbed into the DataTableReader, which is how the class is able to iterate not only through the tables, but through the columns and rows. Note that DbDataReader implements IEnumerable, and thus has the functionality of a Collection, which is how you can drill into this data in the debugger.

Figure 1 shows part of the Visual Studio® QuickWatch window for a DataTableReader. Note the objects with the lock icons. The DataTableReader has knowledge of the DataTable from which it has been derived, though the DataTable's interface is not exposed.

Figure 1 DataTableReader in the Visual Studio QuickWatch Window

Figure 1** DataTableReader in the Visual Studio QuickWatch Window **

Like the DataReader, however, the information is both forward-only and accessed by relative position. If there are multiple resultsets, you can move forward from one to the next with the NextResultset method. You jump into the DataTableReader with the Read method which will position the pointer on the first row of the first resultset. As with the .NET Framework 1.x SqlDataReader and related classes, data is accessed using the Item property or one of the format specific methods, such as GetDateTime.

It is important to remember that unlike the DataTableReader, database-specific DataReaders stream directly from the database and maintain an open connection. Passing a DataReader from one application layer to another is not recommended as you are dragging the connection along with you. A DataReader whose connection is not properly closed can wreak havoc on your resources. You won't need to worry about this with the DataTableReader.

Batch Updates

If you fancy yourself a hard-core developer and find the DataAdapter commands to be too much like wizards, the DataAdapter's new UpdateBatchSize property (available in the SqlDataAdapter and OracleDataAdapter classes) may change your mind.

In the .NET Framework 1.x, the need to update multiple records has forced you to choose among efficiency, performance, and security. Using the DataAdapter's built-in Update command can save significant coding, but in the background it was still performing one database call per record.

Enter the UpdateBatchSize property of the DataAdapter. When batch updating is used, the DataAdapter builds a series of parameterized commands and sends them all at once to the server during an update. This could be done manually, but it is complex and as a result is not always the best option for many developers.

There has been a big change in batch updating since the earlier betas. In Beta 1, a query similar to Figure 2 (as captured by the SQL Profiler receiving a batch of three update queries) was passed to the database. Unfortunately, the SQL Server limit of 2,100 parameters could be quickly reached with this approach. The ADO.NET team has revised the internals of the function so that there is no more concern about the number of parameters in the query. If you send a batch of more than one update query to SQL Server, you can now see in SQL Profiler that each update query is run individually.

Figure 2 Batch Update Query in Beta 1

exec sp_executesql N'BEGIN exec sp_executesql @P1, @P2, @P3, @P4, @P5, @P6, @P7, @P8; exec sp_executesql @P1, @P2, @P9, @P10, @P11, @P12, @P13, @P14; exec sp_executesql @P1, @P2, @P15, @P16, @P17, @P18, @P19, @P20; END ',N'@P1 ntext,@P2 ntext,@P3 smallint,@P4 int,@P5 int,@P6 money,@P7 smallint,@P8 real,@P9 smallint,@P10 int,@P11 int,@P12 money,@P13 smallint,@P14 real,@P15 smallint,@P16 int,@P17 int,@P18 money,@P19 smallint,@P20 real', @P1=N'UPDATE [Order Details] SET [Quantity] = @p1 WHERE (([OrderID] = @p2) AND ([ProductID] = @p3) AND ([UnitPrice] = @p4) AND ([Quantity] = @p5) AND([Discount] = @p6))', @P2=N'@p1 smallint,@p2 int,@p3 int,@p4 money,@p5 smallint,@p6 real', @P3=2,@P4=10284,@P5=60,@P6=27.20,@P7=1,@P8=0.25,@P9=2,@P10=10284,@P11=67, @Pz12=11.20,@P13=1,@P14=0.25,@P15=2,@P16=10285,@P17=1,@P18=14.40,@P19=1,@P20=0.2

You can control the UpdateBatchSize property yourself, defining the number of rows to update in one batch which can be fine-tuned for efficiency based on your resources. The greatest benefit of batch updating is that you will have fewer round-trips to the database server if you have a large number of rows to update.

Bigger, however, is not always better with UpdateBatchSize. For example, with a local database it might be better to make more round-trips to the server with smaller batch sizes, whereas connecting to a remote database might demand the inverse. There are many factors you will need to consider relating to your resources, from the speed of the client machines to the latency of your network when you are determining how large to make the batches. Tuning batch sizes based on your own variables is recommended.

The new ToTable method of the DataView lets you create a new table based on a sorted or filtered DataView of another DataTable. If the original DataTable that the DataView is based on is part of a DataSet, the DataTable returned by this method still has an association with the originating DataSet. Because of this, you cannot add this DataTable to another DataSet directly. You will get an ArgumentException stating that the DataTable already belongs to another DataSet. With the .NET Framework 1.x, you can use a copy of the DataTable instead:

myDataSet.Tables.Add(MyNewTable.Copy)

Figure 3 Visualizer

Figure 3** Visualizer **

One of the overloads of ToTable accepts a Boolean that filters distinct rows, a feat which has been difficult to accomplish in the past. For example, if you have a DataTable whose rows contain author names, addresses, and contracts, it would be quite a chore to create a list of the unique states represented by those authors or a list of authors with contracts and without in each state. You would need to create a routine that iterates through the rows and compares values from one row to another. Although the DataView itself can't filter on the unique values, the ToTable method allows you to create a separate table that will filter out rows where the key values are the repeated values. Along with the Boolean, you need to pass in a parameter array of the columns that you would like to filter. Lastly, another overload of ToTable accepts an output table name as an additional parameter:

Public Function DVtoNewTable() As DataTable Dim ds As DataSet = GetAuthors() Dim dt As DataTable = ds.Tables(0) Dim dv As DataView = dt.DefaultView Return dv.ToTable("StatusbyState", True, _ New String() {"state", "contract"}) End Function

Figure 3 shows the Visual Studio 2005 DataSet/DataTable Visualizer examining the result of this method.

Binary Remoted DataSets

One of the benefits of remoting over XML Web services is that data can be transported not only in XML format, but in binary format as well, unless it's data in a DataSet. Even though their encoding mechanism is binary, DataSets are serialized as XML before being sent through the pipe, causing a performance hit. In ADO.NET 2.0, this has been fixed. Remoted DataSets are now truly serialized as binary data. For more information, see Cutting Edge: Binary Serialization of DataSets.

RowState

RowState can sometimes be a mysterious beast, especially when you are adding and modifying rows. Consider the following states as you step through a typical procedure to create a DataRow, add it to a DataTable, and modify it:

Create New DataRow via DataTable.NewRow Detached Modify Data in DataRow Detached Add Row via DataTable.Rows.Add Added Modify Row Again Added Update DataTable via AcceptChanges Unchanged Modify Row again Modified

The resulting RowStates are as you would expect. But what if you receive data from a Web service or through remoting and you need to use it to update your local datastore? If you are using the DataAdapter Update method, these records will, in most cases, be ignored because they will likely be flagged as Unchanged.

Two new methods have been added to the DataRow class to deal with this: SetAdded and SetModified. These methods allow you to change the state of an Unchanged row to either Added or Modified, giving you a little more granular control over the RowState. Note that in the current version of the Beta (the November Beta 1 Refresh at the time of this writing) the IntelliSense® is not quite correct in its descriptions. SetAdded tells you that it will "reject changes to a row since AcceptChanges was last called." SetModified is actually displaying the description for the method directly above it, SetColumnError, so beware. While these two methods can be misused, they can help you solve some concurrency problems.

The ProviderFactory

There is a collection of new classes inside of the System.Data.Common namespace that make it easier to handle a variety of databases. Previously, if you wanted to accommodate different databases, you might have written separate blocks of code for each database type. In the .NET Framework 1.x, SqlConnection and OledbConnection have no common data-related ancestor in their inheritance. The same is true for SqlCommand and OledbCommand. The best option for writing database-independent code was to use interfaces.

The new ProviderFactory classes let you create provider-independent connections, commands, adapters, and more. You can then code against those and pass in the name of the actual namespace you want to work with at run time. Here's a quick look at how that works:

Public Sub ProviderTests(ByVal providerName As String) Dim provider As DbProviderFactory = _ DbProviderFactories.GetFactory(providerName) Dim connection As DbConnection = provider.CreateConnection() Dim adapter As DbDataAdapter = provider.CreateDataAdapter() End Sub

This is incredibly simple to create and use. In this example, provider name strings (like "System.Data.SqlClient") could be stored in a resource file, the application configuration file, or with any other mechanism you use for storing configuration data.

SQL Server Updates

Because the ADO.NET team is part of the SQL Server organization at Microsoft, it should come as no surprise that special attention has been paid to the interactions between ADO.NET and SQL Server. Much has and will be written about the fact that SQL Server 2005 is capable of hosting the CLR. The System.Data.Sql and System.Data.SqlServer namespaces contain the tools for interacting with .NET from SQL Server 2005. Note that the System.Data.SqlServer namespace from earlier releases of SQL Server 2005 betas is targeted to be removed in the Beta 3 version. Its functionality will be wrapped into the SqlClient namespace. But let's take a look at a few of the many ways in which ADO.NET 2.0 now makes use of some specific SQL Server functionality. All of these functions are available solely in the SqlClient namespace.

One of the big changes in SQL Server 2005 is the addition of Multiple Active Resultsets (MARS). You have probably heard the puns such as "Data is from Venus, Results are from MARS."

With MARS, the SQL Server team changed how a resultset is delivered from SQL Server and the ADO.NET team has taken advantage of this. The result is that you can now get multiple DataReaders from the same connection, one of the most requested features of ADO.NET. MARS is enabled by default when connecting to SQL Server 2005, but can be configured with the Boolean parameter MultipleActiveResultSets in the connection string, which you can use to deactivate MARS if you want.

It is important to remember that MARS requires separate commands to generate multiple DataReaders from one connection, though you can use the same command to execute Updates and Inserts, and so on. Also remember that MARS comes with resource overhead and you should consider comparing the performance of reused SqlConnection objects (which are simpler to code) to the performance that connection pooling already achieves. MARS will not always improve performance. But to be able to wrap multiple commands into one transaction process, a performance hit may be an acceptable price to pay.

When using transactions with MARS, creating multiple transactions within one connection is not allowed and will throw a System.InvalidOperationException. Though database transactions (such as SqlTransaction and OledbTransaction) are still available, you should definitely explore the powerful TransactionScope class in the new namespace, System.Transaction. TransactionScope can be leveraged for local transactions as well as distributed transactions. For more information on the System.Transactions namespace, see John Papa's February 2005 Data Points column.

This type of connection reuse was not a problem for DataAdapters prior to ADO.NET 2.0. When you fill a DataAdapter, the data is basically pushed all the way out of the pipe so that it is free and clear for another call.

Threading, Notifications, and Bulk Copy

Writing multithreaded applications can be complex but the benefits can be huge. One of the innovative new features in ADO.NET 2.0 is true asynchronous processing capabilities in the SqlCommand class. The SqlCommand's asynch methods are a great addition for middle-tier processing and are handy when, for example, you have to execute many queries in a row (given that they are not dependent on each other). These queries can now run simultaneously. For an in-depth look at these features, see the MSDN online article "Asynchronous Command Execution in ADO.NET 2.0" by Pablo Castro. This is a powerful addition to ADO.NET and Pablo's article outlines its use.

At this point, while it may seem that combining asynchronous commands with MARS would create some extremely impressive performance scenarios, the performance gain would, as a matter of fact, be quite minimal and probably not worth the complexity.

Two more features specific to SQL Server in ADO.NET 2.0 are Data.Sql.SqlNotification and Data.SqlClient.SqlDependency. SQL Server 2005 can now push change notifications into a queue by hooking the SqlNotification or SqlDependency object to your SqlCommand. Then you can either subscribe to the queue through a special RECEIVE query or just poll the queue yourself. If you use the RECEIVE query, you can create an event handler to capture the messages from SQL Server when changes have been made.

SqlDependency is a higher-level abstraction of SqlNotification that uses the default behaviors of SqlNotification and is a little easier to code against. For implementation details, check the Beta 1 docs or some of the other resources found in this article.

There are some important rules for using notification. First, when referencing a table in your query, you must use a two-part name for the table. If the table has a single name, such as the authors table in the pubs database, then you would append the owner name (such as dbo.authors). In a case where the table already has a two-part name, such as person.contact in AdventureWorks, person.contact is sufficient. Second, you must specify the columns that are being returned rather than querying with "*" as in "select * from authors". Third, notification is designed to scale for large numbers of common queries in which you have only a few parameters changing and it would significantly impair the performance of your application if you were to use numerous ad hoc queries.

There are more rules listed in SQL Books Online under Query Notifications. Additionally, there are some very specific rules to follow (granting permissions, setting datatable properties, and so on) in SQL Server 2005. (See "Query Notifications in ADO.NET 2.0".)

It is important not to get the SqlDependency and SqlNotification classes confused with the System.Web.Caching.SqlCacheDependency class. This ASP.NET class can not only use SQL Server 2005 query notification, but can also work with SQL Server 2000 and SQL Server 7.0. The ADO.NET classes will not know how to work with the previous versions of SQL Server, however. For more information on caching in ASP.NET 2.0, take a look at "Caching Improvements in ASP.NET Whidbey," by G. Andrew Duthie and "Improved Caching in ASP.NET 2.0," by Stephen Walther. You can also find a thorough examination of notification and SQL Server 2005 in A First Look at ADO.NET and System.Xml v.2.0, by Alex Homer, Dave Sussman, Mark Fussell (Addison Wesley, 2004).

If you have ever considered using ADO.NET to get data from one source into SQL Server, you have probably been shocked by how much longer it takes than a simple Data Transformation Services (DTS) transfer. That's because with ADO.NET, you end up writing one record at a time to the database in the background. The new SqlBulkCopy class enables bulk inserts of data into SQL Server from within your .NET-based application. Even leveraging the new batch updates will not come close to the SqlBulkCopy class in ADO.NET 2.0, which is nearly as fast as DTS.

SqlBulkCopy lets you push data from a DataTable, a collection of DataRows, or a class that implements IDataReader with just a few lines, as shown here:

Dim connDest As New SqlConnection(sqllocal) connDest.Open() Dim dr As SqlDataReader = MyRoutinetoPopulateaDataReader Dim oBCP As New SqlBulkCopy(connDest) oBCP.DestinationTableName = "MyDestinationTable" oBCP.WriteToServer(dr) dr.Close() oBCP.Close() connSrc.Close() connDest.Close()

In my own tests when transferring a SQL Server 2000 table of over a hundred thousand records into a SQL Server 2005 database, SqlBulkCopy performance was almost identical to DTS, whereas using batch updates was orders of magnitude slower. So now it is quite conceivable to have a programmatic solution to moving large amounts of data without having to rely on external processes to make it happen.

Every application that you write can benefit from the advances in ADO.NET 2.0. Here I've mentioned just some of the new and enhanced features. You will find that the MSDN Documentation that is supplied with Beta 1 (which can also be accessed at msdn2.microsoft.com/library) has numerous articles of interest as well.

Julia Lerman is a consultant who has been developing database applications and consulting to businesses since 1984. She is also a conference speaker, a Microsoft .NET MVP, an ASPInsider, leader of the Vermont .NET User Group, and an INETA board member. Julia lives in Vermont and blogs at thedatafarm.com/blog.