This documentation is archived and is not being maintained.

Consuming ADO.NET DataSets in the Microsoft Office System

Office 2003

Don Kiely
Information Insights

March 2004

Applies to:
    Microsoft® Office Word 2003
    Microsoft Office Excel 2003
    Microsoft Visual Studio® Tools for the Microsoft Office System

Summary: Data interoperability with Microsoft Office–based applications is an important part of real-world solutions. ADO.NET is the powerful, flexible data access layer in the Microsoft .NET Framework that makes data interoperability with any application easier and more extensible. This article discusses how to create ADO.NET DataSet objects for use with applications in the Microsoft Office System, thus creating the vital link between Office solutions and ADO.NET. (21 printed pages)


ADO.NET Data Access Tier
Populating Documents from a DataSet
Using Visual Studio Tools for the Microsoft Office System


The Microsoft .NET Framework and Microsoft® Visual Studio® .NET development tools set high standards in the Microsoft Windows® development environment for making programming more efficient, making applications more secure, and providing a solid foundation for distributed applications. Microsoft .NET–based tools make Windows programming fun again.

Microsoft is gradually incorporating the .NET Framework into its various products, but direct support has not yet made it into the Microsoft Office System. There are many extensions and new features implemented with the .NET Framework in the Microsoft Office System, but the core objects are still COM-based, as they have been for years.

That doesn't mean that the Microsoft Office System and .NET-based development are incompatible. To the contrary, Microsoft has built COM interop into the Microsoft Office System, making it easy to combine the best features of both platforms. One of the most exciting tools is Microsoft Visual Studio Tools for the Microsoft Office System, which lets you write managed code that executes within a Microsoft Office Excel 2003 or Microsoft Office Word 2003 document.

A common programming task is reading data from a database and using it in an Office document. Microsoft Visual Basic® for Applications (VBA) has supported various data access technologies for a long time, including Data Access Object (DAO) and, more recently, Microsoft ActiveX® Data Objects (ADO). Although these are fine for many programming needs, ADO.NET has much-improved features for working with databases, particularly when creating disconnected data sets and integrating with XML. The ADO.NET object you use most often with the Microsoft Office System documents is a DataSet, which can contain multiple sets of data from a database. But because there is no easy way to use ADO.NET objects from a VBA application, you have to write .NET-compatible code to create an object and insert it into an Office document.

ADO.NET Data Access Tier

ADO.NET is the data access portion of the .NET Framework, contained within the System.Data namespace. It provides a rich set of objects for both connected and disconnected access to almost any kind of data store, including relational databases. When working with DataSets, you generally use Connection, Command, Data Adapter, and DataSet objects. That seems like a lot of work, but they are easy to use and you can frequently bypass explicitly creating one or more of the objects.

When populating a DataSet with data it is common to use SqlConnection and SqlDataAdaptor objects. The SqlConnection object contains the basic information about where the data comes from. The SqlCommand object combines the connection object with the specific instructions executing against the database, including the SQL statement or stored procedure name and any parameters needed. The SqlDataAdapter takes that command and connection information to connect to the database and returns data, finally placing it in a DataSet. Note that the DataSet has no information about where the data came from, whether it came from Microsoft SQL Server or any other data store, or was created in memory on the fly.

You can use a SqlDataReader object for a persistent connection to the database while reading data. This object provides very fast forward-only, read-only access, ideal for grabbing data quickly from the database. The SqlDataReader is specific to the database providing the data because it remains connected to the database until closed or until the last data is read.

Any data access using ADO.NET DataSets uses two sets of objects: one set specific to the type of data store your code accesses, and another that contains the data without any features specific to a particular data store. For example, to use the Northwind database in Microsoft SQL Server 2000 you use the System.Data.SqlClient namespace in ADO.NET and the .NET Framework for efficient data access. Objects in the System.Data.SqlClient namespace are designed to work with SQL 7.0 and later, and if you are using another back-end database, you need to use different but equivalent objects, such as those for OLE DB, Oracle, or ODBC databases.

Let's see how to use the ADO.NET objects in a data access layer as the foundation for consuming them in an Office application.

Creating a DataSet

Suppose that you need to work with a flexible data object, one that lets you include multiple tables, set relationships between the tables, and read the data forward and backward.

The ADO.NET object that provides these features—and many more—is the DataSet. A DataSet is a disconnected object, meaning that once you fill it with data there is no persistent connection to the database. The network connection to the database server can even go down without affecting your use of the data in a client application. One of the advantages of this technique is that you free up connections for other uses, and your application is no longer dependent on the connection. However, it also means that data in the database can change after populating the DataSet, so it is good to be aware of the possible discrepancy.

The data access layer can create multiple DataSet objects. For example, you may use one DataSet to contain static data for populating list boxes in the user interface, and you may make that object accessible to many users. Another object may contain data a user updates, simplifying the code necessary to propagate the changes back to the database. Or you can use one object for all uses.

Figure 1 shows the structure of the tables in the Northwind database that contain customer and order information. You can create a DataSet with a structure that mirrors the database, or change it entirely to suit the needs of the application.


Figure 1. The structure of the tables used in the Northwind Customers DataSet

The code below defines a DataSet with the full list of customer order data in the GetDataSet method. It takes a bSaveSchema parameter to indicate whether the procedure should save a copy of the DataSet object's XML and XML schema to disk files for use in other parts of the application.

The GetDataSet code starts by instantiating a new, empty DataSet object called NorthwindCustomerOrders. If you do not supply a name for the DataSet it is called NewDataSet by default. This doesn't matter most of the time, but when you generate and use the XML from the DataSet, this name is sometimes used as part of the structure:

Public Shared Function GetDataSet(ByVal bSaveSchema As Boolean) As DataSet
  'Instantiate the DataSet, giving it a custom name
  '(The default name is NewDataSet)
  Dim ds As New DataSet("NorthwindCustomerOrders")

The next step sets up the database connection, reading the connection string from an application configuration file:

  'Set up the connection to the database, reading from the App.config  
  Dim sqlCnn As New SqlConnection
  sqlCnn.ConnectionString = _

There are several ways to create a DataTable object within a DataSet to contain sets of related data. One way is to create a string variable with the SQL statement. This is mostly done for ease; usually you want to use stored procedures to access data. The simple SELECT statement doesn't require any parameters—it returns the complete set of records in the table.

Most ADO.NET classes, like many others in the .NET Framework, have several constructors with lots of options for how you instantiate objects. You can instantiate the SqlCommand object by passing a string with the SQL statement and the connection object.

The next step is to create a SqlDataAdapter object, passing the constructor a reference to the SqlCommand object. At this point, there is an open connection to the database but it has not yet read any data. As shown in Figure 1, everything is now primed and ready to read data using the data adapter's Fill method. Fill uses two parameters in the following code: the DataSet object to be filled and a unique name (Customers) that later lets the code uniquely reference this table. If you don't include a name, the first table added to a DataSet is named Table, the second Table2, and so on.

  'Create the SQL objects and populate the first DataTable
  Dim sql As String = "SELECT CustomerID, CompanyName, " _
    & "ContactName, ContactTitle, Address, City, Region, " _
    & "PostalCode, Country, Phone, Fax FROM Customers"
  Dim sqlCmd As New SqlCommand(sql, sqlCnn)
  Dim sqlDA As New SqlDataAdapter(sqlCmd)
  sqlDA.Fill(ds, "Customers")

In many applications, you probably create only a single table within a DataSet object. But sometimes you need to work with several sets of records from a variety of tables in the database. You can add as many tables as you wish.

Because you may need to work with each field of data from the table, repeat the steps to add all the tables to the DataSet, reusing the sql string variable, the SqlCommand object, and SqlDataAdapter.

At this point, your DataSet should be fully populated with all of the data it needs from the database, so you can close the connection object. It is always good programming practice to close a connection as soon as you can to free it up to another application. On the other hand, if the current procedure was going to use it again immediately, perhaps to create another DataSet, you can leave it open to avoid the overhead of reestablishing the connection.

Note   If you reuse the exact same connection string for every connection in the application —including any embedded spaces—ADO.NET, by default, pools connections for you. So in most cases, closing and reopening a connection incurs very little overhead.

Let's say that your DataSet is now fully populated, ready for use, and disconnected from the database. In many applications, this is all you need, but you may want to create relationships that establish the hierarchical nature of the data. Create these relationships using the DataRelation class.

Important   Even if you defined referential integrity relationships in your database—and you should—there is no way to recreate automatically those relationships in a DataSet.

Creating a data relationship requires a reference to two or more columns within existing tables in the DataSet. The code uses parent and child DataColumn objects to add a reference to the primary and foreign keys in the two tables. Once you instantiate those objects , you can instantiate the DataRelation object, passing it a name that identifies the relationship along with the parent and child objects.

The code also sets the relationship's Nested property to True. When you read the XML representation of the DataSet, child elements are not automatically nested under their parent objects. Instead, all parent elements are listed first, followed by all child elements, and grouped by level in the hierarchy. It is rare that such a structure is useful, but the Nested property nests the data.

The last step is to add the new DataRelation object to the Relations collection of the DataSet object. Notice that the names for the tables are in quotation marks. You replace these with your own table names.

  'Add the relationships to the DataSet
  'Customers -> Orders
  Dim parent As DataColumn = _
  Dim child As DataColumn = ds.Tables("Orders").Columns("CustomerID")
  Dim sqlRel As New DataRelation("Customer Orders", parent, child)
  sqlRel.Nested = True

The last step in the procedure optionally saves the DataSet object's XML schema and data to a file on the hard disk. This is not necessary when using a DataSet, but in the "Import a DataSet as XML into an Excel 2003 Worksheet" section you see how to put this to use.

Most .NET Framework objects support serialization to one degree or another, but the DataSet object makes it simple with the WriteXml and WriteXmlSchema methods. You call each method, passing the name of a disk file to hold the data. You can supply a complete path, but this code saves the files to the application's bin directory. Once again, the table names are in quotation marks.

  'Optionally save the XML data and schema to disk files
  If bSaveSchema Then
    'Places the file in this app's bin directory
  End If

The procedure returns the completed DataSet object to the calling procedure.

  Return ds
End Function

Creating a DataSet for a Single Entity

When you create a DataSet in a client application, ADO.NET pulls all of the data across the network from the database, as much as is necessary to fully populate the DataSet.

What if your table has many millions of records? You want to pull as little data as possible across the network to fill the DataSet, such as only the information on a single customer. In the Northwind database, the GetCustomerDataSet in the data access layer class does just that, using the CustomerID the user selects from the drop down list on the form.

The code you need is quite similar to the code in GetDataSet, but there are a few important differences. First, you now need to filter the data. That means filtering the table to return only the record needed, and doing the same for any relevant related tables but returning multiple records, like one for each order that customer has placed.

You can filter and sort data that is already contained in a DataSet, but that means that the entire set of data has to come across the network. A better way is to perform the filtering and sorting work on the database and return only the necessary records.

You can filter with a SqlParameter object. The parameters are applied to each dynamic SQL statement in the code, but the method is the same for calling stored procedures.

The following code populates the Customers table in the DataSet. Notice that now the SQL statement has a WHERE clause without an ORDER BY clause—there is no reason to sort a single record. The SqlCommand object is instantiated in the same way as before, but it requires setting the @CustID parameter. The SqlParameter object is instantiated with the parameter name, @CustID, and its data type, nchar(5). It then sets the parameter's Value property to the value passed into the procedure. The last step is to add the new parameter to the command object's Parameters collection and call the data adapter's Fill method as before. You can add as many parameter objects as you need for the SQL statement.

Dim sql As String = "SELECT CustomerID, CompanyName, ContactName, " _
  ContactTitle, Address, City, Region, PostalCode, Country, " _
  & "Phone, Fax FROM Customers WHERE CustomerID = @CustID"
Dim sqlCmd As New SqlCommand(sql, sqlCnn)
Dim sqlParam As New SqlParameter("@CustID", SqlDbType.NChar, 5)
sqlParam.Value = sCustID
Dim sqlDA As New SqlDataAdapter(sqlCmd)
sqlDA.Fill(ds, "Customers")

The other tables are populated in pretty much the same way, except that some filtered tables require a join back to the Orders table so that the CustomerID is accessible.

Each subsequent table also reuses the SqlParameter object, as all of them need to filter on the CustomerID, except for the Products table. You want to return all products in the database, regardless of whether any are ordered. That section of the code calls the Parameters collection's Clear method.

Populating Documents from a DataSet

Once you have populated a DataSet with data and have disconnected from the database, there are many ways to use the data in various Office documents. The Microsoft Office System can't process most .NET-based objects natively, so most of the time you need to use COM interoperability to manipulate the data and convert it into whatever form you need to add into a worksheet or other document. You can use the new integrated XML features of applications in the Microsoft Office System to accomplish this integration.

Inserting Data into a Worksheet

One way to use a DataSet is to automate the creation of a worksheet using many of the same programming techniques you probably used in VBA or from external applications, such as those written in Microsoft Visual Basic 6.

A Windows Form can use a custom BuildWorksheet method to open Microsoft Office Excel 2003, add a workbook, and create a worksheet with suitable data. Figure 2 shows just such a result. Unfortunately, there is not a property in Excel to which you can assign a reference to the DataSet as you can with the DataGrid Windows and Web Form controls. Instead you need to write some code to accomplish the task.


Figure 2. A new Excel 2003 worksheet populated with data from a DataSet

The Microsoft Office System exposes objects through COM objects. So Microsoft released a set of Primary Interop Assemblies (PIAs) that are optimized for accessing COM objects from .NET-based assemblies. You need to install the PIAs on each client computer, but they are available when you install the Microsoft Office System.

When installing the Microsoft Office System, select the advanced customization option in the wizard so that the next screen shows the list from which you can select or clear features, as shown in Figure 3. The .NET Programmability Support option installs the PIAs. You need to install PIAs for each application you plan to automate from a .NET-compatible application, and you can install the PIAs to the Global Assembly Cache (GAC) or include them in your project directory. You can find additional information in the following article: How to: Install Office Primary Interop Assemblies.


Figure 3. Using the wizard for the installation of PIAs

When you want to use the Excel 2003 PIAs in a .NET-based application, add a reference to them in the Add Reference dialog box of Visual Studio .NET, as shown in Figure 4. Once you add this reference, you can manipulate the Excel COM objects as though they were native .NET-based assemblies.


Figure 4. Add a reference to the Excel 2003 PIAs using the Add Reference dialog

BuildWorksheet is defined as a private Sub procedure in the ADONET.vb Windows Form, taking a DataSet object as its only parameter. Because it actually processes a single DataTable object with the customer data, the code starts by declaring a dt object variable to hold that reference.

Private Sub BuildWorksheet(ByVal ds As DataSet)
  Dim dt As DataTable

To manipulate an Excel workbook, the code needs to instantiate an Excel Application object, and then use the object model to add a new workbook, change the name (to Northwind Customers in this example), and do other UI chores.

  'Create an instance of Excel 2003, add a workbook, 
  'and let the user know what's happening
  Dim xl As New Excel.Application
  xl.ActiveSheet.Name = "Northwind Customers"
  xl.Visible = True
  xl.Range("A1").Value = "Loading the DataSet...."

Once Excel is loaded and running, the code needs a reference to the table in the DataSet ds. Then it loops through the Columns collection of that DataTable object and places the field names in the first row of the worksheet. This example uses a Customers table and a Customers column heading.

    xl.ScreenUpdating = False

    'Start with the Customers table
    dt = ds.Tables("Customers")

    'Add the column headings for the Customers
    Dim dc As DataColumn
    Dim iCols As Int32 = 0
    For Each dc In dt.Columns
      xl.Range("A1").Offset(0, iCols).Value = dc.ColumnName
      iCols += 1

All that's left is to loop through the rows in the table and use the ItemArray property of the DataRow object to write the contents of each data row to a row of the worksheet. This is about as close as Excel comes to supporting a DataSet object directly.

    'Add the data
    Dim iRows As Int32
    For iRows = 0 To dt.Rows.Count - 1
      xl.Range("A2").Offset(iRows).Resize(1, iCols).Value = _
  Catch ex As Exception

The remaining code in the procedure updates the UI and applies built-in formatting to the worksheet.

    xl.ScreenUpdating = True
  End Try

  'Make the sheet pretty
  With xl.ActiveSheet.Range("A1")
  End With

  xl = Nothing
End Sub

Even though the Microsoft Office System doesn't natively support .NET-based objects, the object model and features of the DataSet in the Microsoft Office System combine to make performing this kind of task simple.

Note   You can use the same technique no matter where the DataSet comes from, such as a .NET-based component or Web service.

Import a DataSet As XML into an Excel Worksheet

Microsoft Office Excel 2003 provides vastly enhanced support for XML, letting you save workbooks as XML and import from XML data sources. Because the native serialization format for a DataSet is XML, you can easily import its data into Excel. The import requires an XML schema so that Excel can properly map the data to cells in a worksheet, but a DataSet can automatically produce that for you. In fact, that is the purpose of the code at the end of the GetDataSet method described earlier in this document, which writes the DataSet object's data and schema to disk files.

If bSaveSchema Then
  'Places the file in this app's bin directory
End If

This code created the XML and schema files in the application's bin directory. Figure 5 shows the resulting XML file.


Figure 5. A Customers.xml file generated using the DataSet's WriteXml method

Figure 6 shows the XML schema, both as displayed in Internet Explorer.


Figure 6. A Customers.xsd file generated using the DataSet's WriteXmlSchema method

The technique also makes use of a new feature in Excel called ListObject, a new type of list structure that essentially provides a full-featured view into data. The resulting list is shown in Figure 7. You can click on the drop down lists in the header row to sort and filter the list. The XML Source pane in the figure displays the schema for the data, letting you modify what is included in the list.


Figure 7. The result of importing XML programmatically using an XML schema map

The BuildXMLMap Sub procedure performs a similar function to BuildWorksheet but uses XML importing and mapping to move the data from the DataSet to Excel. The procedure begins by accepting a DataSet object as its only parameter, instantiates Excel, adds a workbook, and does other UI chores. (The worksheet in this example is Northwind Customers. You substitute your own worksheet's name, of course.)

Private Sub BuildXMLMap(ByVal ds As DataSet)
  'Create an instance of Excel 2003, add a workbook, 
  'and let the user know what's happening
  Dim xl As New Excel.Application
  xl.ActiveSheet.Name = "Northwind Customers"
  xl.Visible = True
  xl.Range("A1").Value = "Loading the DataSet...."

The procedure loads the XML data directly from the DataSet object, but there isn't any way to read the schema directly in memory. So the code grabs a full path and file name to the .xsd schema file using the GetFullPath method of the .NET Framework Path object.

    Dim sMap As String = System.IO.Path.GetFullPath("Customers.xsd")

Next, the code adds the XML mapping schema to the XmlMaps collection of the active workbook. The first parameter of the Add method is the location of the mapping file and the second is the name of the map as saved in the collection. The line of code also sets the name of the map used later to reference this map.

    'Add the map to the active workbook
    'You can only add a map from a disk file.
    xl.ActiveWorkbook.XmlMaps.Add(sMap, _
      "NorthwindCustomerOrders").Name _
      = "NorthwindCustomerOrders_Map"

The code gets a reference to the new XML map, saves it to the map variable, and then adds a ListObject to the active worksheet. The list spans the columns of your choice, columns A through J in this case, and includes as many rows below the header row as are necessary to accommodate the data.

    'Specify the cells where the mapped data should go upon import
    Dim map As Excel.XmlMap = _
    xl.Range("A1", "J1").Select()
    Dim list As Excel.ListObject = _
      CType(xl.ActiveSheet, Excel.Worksheet).ListObjects.Add

The next step is to map the specific elements in the XML data to each column in the list. The SetValue method takes a reference to the map used and an XPath expression that identifies which elements to use for each column. The code below sets column A to the CustomerID element, then sets the heading for the column to "Customer ID". You can set the contents and the headings for the other columns in the same way.

    list.ListColumns(1).XPath.SetValue(map, _
    xl.Range("A1").Value = "Customer ID"

The structure of the ListObject is now set, so the next step is to import the XML data. The Import method reads a disk file and the ImportXml method—used here—reads it directly from the GetXml method of the DataSet without saving the XML data to disk. The procedure finishes by opening the XML Source task pane for your convenience

    'Import the XML data
    xl.ActiveWorkbook.XmlMaps("NorthwindCustomerOrders_Map"). _

    'Open the XML Source Task Pane

  Catch ex As Exception

  End Try

End Sub

Using Visual Studio Tools for the Microsoft Office System

The examples so far used a standard .NET Windows Forms application to insert a DataSet object's data into a document in the Microsoft Office System. The specific techniques for the various applications in the Microsoft Office System vary based on each application's object model, but the overall process is the same. Essentially, managed code is running Office as it loads the object models into the same process space as the running assembly. However, VBA developers are accustomed to a programming model where they write code behind documents with a tight connection between the document and the running code.

That is part of the purpose behind Visual Studio Tools for Office, a set of new Visual Studio .NET project templates that take advantage of the PIAs and COM interop to run managed code behind Excel and Word documents. The net effect is similar to that of writing and debugging VBA code behind a document. Only when using Visual Studio Tools for Office one is writing and debugging managed code Visual Studio .NET 2003 that is tightly coupled to the document at a more intimate level.

The next section shows how to populate a new Word document with data when the document is created from a template.

Populating a Word 2003 Template from a DataSet

Bookmarks have long provided a handy way to insert discrete pieces of data into a Word 2003 document. It is easy to create a new Word 2003 template, insert bookmarks with the proper formatting, and then programmatically access the bookmark collection to insert data.

When you create a Visual Studio Tools for Office project, you have the option of creating a new Word or Excel document. In the example below, the project wizard created a Word template called, shown in Figure 8.


Figure 8. A Word 2003 template

Notice the bookmark insertion points at the location for the recipient's name and address in Figure 8. For coding convenience, the document designer named the bookmarks the same as the fields in the data table.

The code used to populate the bookmark values is minimal and runs in the ThisDocument_New event procedure, which fires when a new document is created from the template. It starts by calling the GetDataSet function to build the DataSet object, a shortened version of the GetDataSet shown earlier in this document. This version creates a DataSet with only a Customers table.

Private Sub ThisDocument_New() Handles ThisDocument.New
  Dim ds As DataSet = GetDataSet()

Next, the code gets a reference to the document's bookmarks, the relevant table, and the sixth data row. This last is rather arbitrary, but a real application includes some mechanism for the user to select a particular customer name to generate the letter.

  'Fill in the document bookmarks
  Dim bookmarks As Word.Bookmarks = ThisDocument.Bookmarks
  Dim bookmark As Word.Bookmark
  Dim dt As DataTable = ds.Tables("Customers")
  Dim dr As DataRow = dt.Rows(5)

Finally, the code loops through the bookmark collection, replacing its text with the value of the field in the data table with the same name.

  For Each bookmark In bookmarks
    Dim bookmarkName As String = bookmark.Name
    bookmark.Range.Text = dr(bookmarkName)
End Sub

Figure 9 shows the results of running this procedure.


Figure 9. The results of populating bookmark text with data from a DataSet

A Word About Memory Management

In .NET Framework–based projects that interoperate with COM-based Office object models, disposal of COM objects and memory management requires a little more attention. When writing Office solutions in .NET Framework–based projects, you have a .NET Framework–based client communicating through a runtime-callable wrapper (RCW) to a COM object. The RCW is a proxy dynamically created at run time by the CLR from the metadata information contained in the Interop assembly. To the .NET Framework–based client, the RCW appears as any other CLR object, but the RCW really acts as a proxy marshalling calls between a .NET Framework–based client and a COM object. One of the by-products of this relationship is there arises a mismatch between the COM world of deterministic object lifetime management, and the .NET world of non-deterministic garbage collection. This mismatch can lead to objects not getting released from memory, application hangs, and other unpredictable behavior.

The main way to reconcile these two is to call the Marshal.ReleaseCOMObject method for each of the objects you have instantiated. The Marshal.ReleaseCOMObject method provides a deterministic release of COM objects, and you then have greater confidence that there are no objects left lurking in memory.


The programming features of the Microsoft Office System and the programming model of the .NET Framework together introduce exciting, new application development opportunities. Microsoft has introduced features that bridge the gap between the inherently different nature of the COM object in the Microsoft Office System and managed .NET-based assemblies. These are the Primary Interop Assemblies or PIAs, and Microsoft can enhance the PIAs and the tools you use to program against them, keeping Office secure in its position as a worthy development platform for productivity applications while taking full advantage of the many benefits of .NET-based development. Thanks to the PIAs you can write managed code to manipulate Office-based objects and incorporate them into your solutions. This document explored how to consume ADO.NET DataSet objects in Office documents, using COM interoperability and some of the new XML features of applications in the Microsoft Office System.

About the author

Don Kiely is a senior technology consultant. When he isn't writing software, he's writing about it, speaking about it at conferences, and training developers in it.