Office Unbound

Bring Your Documents To Life With Data Binding In Visual Studio Tools For Office

Eric Carter and Eric Lippert

This article discusses:

  • Defining data sources for Office documents
  • Creating data-bound documents
  • Understanding datasets, adapters, and sources
  • Binding to proxy objects
This article uses the following technologies:
Visual Studio 2005 Tools for the Microsoft Office System

Contents

Defining a Data Source
Data-Bound Documents
Datasets, Adapters, and Sources
Security Best Practices
DataSets
A Disconnected Strategy
Typed and Untyped Datasets
Adapters
Using Binding Sources as Proxies
Data-Bindable Controls
Special Considerations
Conclusion

Microsoft Visual Studio 2005 Tools for the Microsoft Office System provides a powerful new toolset for Office development within the Visual Studio® environment. One of the most important new features of Visual Studio Tools for Office is the ability to create documents that are data bound to databases, Web services, or objects. Elements of an Excel® workbook (called host controls in Visual Studio Tools for Office), such as a range or a list object or a bookmark in a Microsoft® Word document, can be data bound using the same tools and syntax used when creating Windows® Forms-based applications. Data-bound Windows Forms controls can also be added to a workbook or document using Visual Studio Tools for Office.

Creating a no-frills, data-bound customized document using the Visual Studio Tools for Office designer requires no coding, but a whole lot of mouse clicking instead. First we'll tell Visual Studio about a data source—in this case, the Northwind sample database that comes with Office—and then drag and drop some data-bound controls onto the spreadsheet.

Defining a Data Source

Let's start up Visual Studio and create a new Excel project. Select File, then New Project, and go to the Office folder under Visual C#® or Visual Basic® to select the Excel Workbook project.

From the Visual Studio Data menu, choose Show Data Sources to display the Data Sources pane. Click Add New Data Source to start the Data Source Configuration Wizard (see Figure 1).

Choose Database and click Next. Then click New Connection. A second wizard will appear listing a variety of data source options, such as Microsoft Access Database File, Microsoft ODBC Data Source, and Microsoft SQL Server™.

Figure 1 Starting the Data Source Configuration Wizard

Figure 1** Starting the Data Source Configuration Wizard **

Choose Microsoft Access Database File and click Continue to go on to the Connection dialog. The Northwind database file is typically in the \Program Files\Microsoft Office\Office11\Samples directory. Click Browse and find the Northwind database. No security is enforced on this database file, so the default Admin username and a blank password will work (in a real-world application with a secured database, it would be a very bad idea to have a blank administrator password). Click OK to close the Connection Wizard and continue with the Data Source Wizard.

When you click Next, Visual Studio notes that you are creating a connection to a local database file that is not part of the current project. If you want this project to have its own copy of the database rather than modifying the original, you can choose to do this and Visual Studio will automatically update the connection to point to the new location. In this first example, there is no reason to make a copy of the database, so click No.

As you can see, all the information about the database connection that you have just created is saved in a connection string. For both convenience and security, it is a good idea to save that connection string in a configuration file rather than hard-coding it into your program.

Figure 2 Choose Your Tables

Figure 2** Choose Your Tables **

The database to which we are connecting might have an enormous number of queries, tables, columns within those tables, and so on. To manage some of this complexity, Visual Studio lets you choose which portions of the database will display in Visual Studio. Let's select the entire Suppliers table and the ProductName, SupplierID, QuantityPerUnit, and UnitPrice columns from the Products table (see Figure 2). Finally, click Finish to exit the wizard.

Data-Bound Documents

The Data Sources window now contains an entry for the NorthwindDataSet. Expand the nodes in the tree view. There are a few interesting things here. First, Visual Studio has discovered from the database that the Products table has a relationship with the Suppliers table; the Products table appears both as a table in its own right and as a child node of the Suppliers table. This will allow us to more easily create master-detail views.

Second, notice that the icons for the columns have "named range" icons, indicating that if you drag the icon and drop it onto the worksheet, you will get a data-bound named range to this column. The default for a column is a named range, and the default for an entire table is a list object, but you can choose other controls by clicking the item and selecting from a dropdown. Another powerful feature of Visual Studio Tools for Office is that you can actually put Windows Forms controls, such as a combobox, into a Word or Excel document and data bind to Windows Forms controls in the document. Suppose you want to have a combobox bound to the CompanyName, for instance. You can choose ComboBox from the dropdown as the control to use for CompanyName, as shown in Figure 3.

Figure 3 Choose the Control Type

Figure 3** Choose the Control Type **

Drag the CompanyName as a combobox, the ContactName as a named range, and the entire Products table onto the worksheet. Use the Products table that is the child of the Suppliers table in the tree view and you'll get a nice master-detail view.

A whole lot of stuff has magically appeared in the component tray below the Excel designer: a DataSet, two binding sources, and two table adapters. We'll get into the details of what these components are for later. For now, just compile and run the application. Without writing a single line of code, you have gotten a data-bound master-detail view on an Excel spreadsheet. As you select different items from the combobox, the named range and list objects automatically update themselves (see Figure 4).

Figure 4 Data-Bound Master-Detail Spreadsheet

Figure 4** Data-Bound Master-Detail Spreadsheet **

We can create a similar data-bound document in Word using bookmarks rather than named ranges, and a DataGrid rather than an Excel list object. Create a new Word document project and again add the Northwind database as a data source to the Data Sources pane. Visual Studio should remember the connection string from last time, so you will not need to configure it again. Unfortunately, in this version of Visual Studio Tools for Office, there is no way to bind a data table to a Word table as you can with an Excel list object.

Figure 5 Master-Detail View at Run Time

Figure 5** Master-Detail View at Run Time **

When we build and run the customized Word document, we have a master-detail view of a data table running in Word, once again without writing a single line of code (see Figure 5).

Datasets, Adapters, and Sources

Now that you have seen a couple of no-coding-required examples, let's take a peek under the hood and see how data binding actually works. The data-binding model used by Visual Studio Tools for Office is almost identical to the model used by Windows Forms. Several players make data binding work, many of which can be seen on the component tray or design surface:

  • A back-end data source, such as a Microsoft Access database, a remote SQL Server database, a Web service, or some other data storage and retrieval system, is where the data ultimately lives.
  • A DataSet provides a disconnected local cache of some portion of the back-end data source.
  • An adapter connects the DataSet to the back-end data source, both to fill the DataSet from the back-end source and to update the back end with any changes. There is usually one adapter per table, which is why there are two adapters in the preceding example.
  • A binding source acts as the intermediary between the user interface control and the DataSet. Although it is possible to bind a control directly to a DataSet, as we'll discuss later, it is usually more convenient to go through a dedicated binding source object.
  • A data-bindable control provides a user interface element that enables the user to read or write the data. Using Visual Studio Tools for Office some built-in elements in Word and Excel documents are data-bindable, such as a Word bookmark and an Excel range. Windows Forms controls can also be added to a Word document or Excel spreadsheet and are data-bindable as well.

The back-end data source is represented in a Visual Studio Tools for Office project by the connection string passed to the adapter; everything else is represented by a member of the customized host item (the worksheet or document) class.

Security Best Practices

As you probably noticed in the Connection Wizard, all the information required to connect to the back-end data source is stored in a connection string generated by the wizard. It typically looks something like this:

Server=MyDataServer;Database=Customers;Integrated Security=true;

That is, it says where the database is located, what it is called, and how the user should be authenticated. All of this is potentially sensitive information. Use caution when embedding connection strings into your programs. Remember, even without the source code, it is very easy to figure out which strings are embedded in a managed application. This particularly applies to connection strings where, instead of using security integrated into Windows NT®, you simply embed something like this directly:

UserID=eric;Password=BigSecret123 

Furthermore, hardcoding embedded strings in your source code makes it hard for developers, testers, end users, and database administrators to update your application should the database connection information change over time. As discussed previously, Visual Studio gives you the option of embedding the connection string in the configuration file. The automatically generated configuration file for the example just shown looks something like this:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections/>
  <connectionStrings>
    <add name=
      "ExcelWorkbook11.Properties.Settings.NorthwindConnectionString"
      connectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data 
      Source='C:\Program Files\Microsoft
      Office\OFFICE11\SAMPLES\Northwind.mdb'"
      providerName="System.Data.OleDb" />
  </connectionStrings>
</configuration>

It is also a good idea to use the principle of least privilege. This is one of the fundamental principles of secure design: grant exactly as much privilege as the user needs to get the job done—no more, and no less. For example, if your user needs to be able to read from the database but not write to it, do not specify a connection string that gives the user administrator rights to the database. Instead, choose a connection string that specifies a username and password with read-only access. That way, if the username and password are ever compromised, at least the attacker does not get administrator access out of it.

Better still, do not use stored user IDs and passwords at all; some databases use integrated Windows authentication so the logged-on user can seamlessly use his already authenticated credentials. Or, if your database system requires a username and password, make the user type them in rather than storing them. As you'll see later when we discuss adapters, you can manually change the connection string used by the adapter before it fills the DataSet. That way you can ask the user to type in a user ID and password and then generate a new connection string from that information.

DataSets

The cornerstone of the Visual Studio Tools for Office data model, and of ADO.NET in general, is the DataSet. We should explain the existence of DataSets by describing the old way of doing data access. Back in the twentieth century, you typically communicated with a database via "ADO Classic" which went something like the following:

  1. Create and open a connection to a database.
  2. Create and execute a database command such as:
SELECT partnumber FROM invoices WHERE price > 100
  1. Enumerate the resulting record set.
  2. Close the connection.

This approach worked fairly well, but it has several drawbacks. The principal drawbacks are consequences of the fact that this model requires a live connection to a database. If there are going to be many live connections, the server needs to be scalable and robust, which can be expensive. Therefore, to minimize load upon the server, we want connections to be short-lived. But because the connection is open while the user is enumerating the record set, the connection is typically open for quite some time—as long as the user is working with the data. Furthermore, even if the server-side expense of keeping connections open is unimportant, this model does not work well in a world where you want to be able to work with your data even if you temporarily lack network connectivity.

A Disconnected Strategy

Database connections are both expensive and necessary and therefore must be managed carefully. In a typical ADO application, much developer effort is expended writing code to ensure that the connection is open for as little time as possible while still meeting the needs of the application's users. ADO.NET addresses the problems of ADO by going straight to the root: if we cannot make connections inexpensive, we can at least make them less necessary. ADO.NET is therefore fundamentally a disconnected strategy. A typical ADO.NET scenario goes something like this:

  1. Create a DataAdapter to manage the connection to a specific database or other data source.
  2. Set properties on the adapter that tell it what query to execute against the database.
  3. Create a DataSet to be filled.
  4. Invoke a method on the adapter to take care of the details of opening a connection, executing the query, saving the results in the DataSet, and closing the connection as soon as possible.
  5. Work with the data in the now-disconnected DataSet.
  6. When finished working with the data, invoke a method on the adapter to re-open the connection to the database and update it with any changes.

And indeed, as you will see later when we discuss adapters, Visual Studio Tools for Office does exactly this on your behalf.

Because the DataSet acts much like the original database, the connection only needs to be open as long as it takes to fill the DataSet. After the data has been copied to the DataSet, you can query and manipulate the DataSet for as long as you want without having to worry that you are consuming a valuable database connection.

Furthermore, there is no reason why the data used to fill the DataSet has to come from a connected database; you could also fill the DataSet from an XML file or write a program to add tables and rows to build one from scratch. DataSets have no knowledge of where the data they contain came from; if you need it, all that knowledge is encapsulated in the adapter.

Typed and Untyped Datasets

In the Solution Explorer of the Word and Excel projects we created earlier, you will find a NorthwindDataSet.xsd file containing the database schema. This is an XML document that describes the tables, columns, and relationships that make up the DataSet. One of the child nodes in the Solution Explorer tree view is NorthwindDataSet.Designer.cs. This file contains the automatically generated code for the DataSet and table adapters.

The first line of the declaration is interesting:

public partial class NorthwindDataSet : System.Data.DataSet {

The generated class is partial so that, if you need to add your own extensions to it, you can do so in a separate file. It is a bad idea to edit automatically generated files. More importantly, this DataSet extends the System.Data.DataSet class. A System.Data.DataSet consists of a collection of data tables. As you would expect, data tables consist of a collection of data columns and data rows. Each class exposes various collections as properties that allow you to navigate through the DataSet.

System.Data.DataSet is not an abstract class—you can create instances and fill them from any back-end data source. But that would be an untyped DataSet; the NorthwindDataSet is a typed DataSet. Untyped DataSets give you great flexibility but are so general that they are somewhat harder to use. For example, if you were to fill an untyped DataSet with data from the Northwind database file, you could access a particular datum with an expression such as this:

name = myDataSet.Tables["Products"].Rows[1]["ProductName"];

But that flexibility comes at a cost: you can accidentally pass in a bad table name or column name, or make a bad assumption about the type of the data stored in a column. Because the structure of the tables and the types of the columns is not known at compile time, the compiler is unable to verify that the code will run without throwing exceptions. Also, the IntelliSense® engine is unable to provide any hints about the DataSet's structure while you are developing the code.

Typed DataSets mitigate these problems. A typed DataSet is a class that extends the DataSet base class; it has all the flexible, untyped features of a regular untyped DataSet, but also has compile-time strongly typed properties that expose the tables by name. A typed DataSet also defines typed data table and data row subclasses.

As you can see from the NorthwindDataSet.Designer.cs file, the typed DataSet has public properties that enable you to write much more straightforward code, such as the following :

name = myDataSet.Products[1].ProductName;

Typed DataSets extend untyped DataSets in many ways; some of the most important are as follows:

  • Tables are exposed as read-only properties typed as instances of typed data tables.
  • Tables have read-only properties for each column.
  • Tables have an indexer that returns a typed data row.
  • Event delegates for row change events pass typed change event arguments. Each row type has a row-changing, row-changed, row-deleting, and row-deleted event. (You might be wondering where the row-adding and row-added events are. The changing/changed events pass a DataRowAction enumerated type to indicate whether the row in question was newly created.)
  • Tables provide methods for adding and removing typed data rows.
  • Rows provide getters, setters, and nullity testers for each column.

In short, it is almost always a good idea to use a typed DataSet. Weakly typed code is harder to read, harder to understand, and harder to maintain.

Adapters

Take a look at the Startup event handler in the Word or Excel examples discussed previously. Visual Studio has automatically generated the code (shown in Figure 6) on your behalf. The one new thing here is the NeedsFill method. NeedsFill is a special Visual Studio Tools for Office method that works with Visual Studio Tools for Office data caching feature. This feature allows you to mark a DataSet as cached so that the DataSet is saved in the document or spreadsheet being customized. This article doesn't cover this feature in any additional detail—for now, the rest of the code should look fairly straightforward. If the two tables need to be filled from the back-end data source, the adapters fill the appropriate tables.

Figure 6 Auto-Generated Table-Filling Code

public partial class ThisDocument
{
    private void ThisDocument_Startup(
        object sender, System.EventArgs e)
    {
        // TODO: Delete this line of code to remove the default
        //       AutoFill for 'northwindDataSet.Products'.
        if (this.NeedsFill("northwindDataSet"))
        {
            this.productsTableAdapter.Fill(
                this.northwindDataSet.Products);
        }

        // TODO: Delete this line of code to remove the default
        //       AutoFill for 'northwindDataSet.Suppliers'.
        if (this.NeedsFill("northwindDataSet"))
        {
            this.suppliersTableAdapter.Fill(
                this.northwindDataSet.Suppliers);
        }
    }
}

There are a number of reasons why you might not want to automatically fill the data tables in the Startup event, which is why the comment points out that you can remove the auto-generated code. For example, as mentioned earlier, you might want to require that the user enter a database password before attempting to fill the DataSet. You can generate a new connection string and then set the adapter's Connection.ConnectionString property.

Or perhaps you want to give the user the option of whether to connect to the back end. If the user is on an expensive or slow connection, he might want to skip downloading a large chunk of data. For any number of reasons, you might not want to connect right away or use the default connection string, so Visual Studio allows you to modify this startup code.

Visual Studio generates strongly typed custom adapters at the same time as it generates the typed DataSet. If you read through the generated adapter code in NorthwindDataSet.Designers.cs, you will see that the generated adapter has been hardcoded to connect to the database specified by the connection string in the configuration file. The bulk of the generated adapter code consists of the query code used to read from the back-end data store into the typed DataSet, and to then take any changes in the DataSet and update or delete the appropriate rows in the store.

The adapter takes care of all the details of opening the connection, executing the query, copying the data into the DataSet, and closing the connection. At this point, you have a local copy of the data, which you can use to your heart's content without worrying about taxing the server further.

When you are finished editing the local copy of the data in the DataSet, you can use the adapter in order to update the database with the changes by calling the Update method of the adapter. The adapter will then make the various additions, changes, and deletions on the back-end database.

Using Binding Sources as Proxies

Why does Visual Studio bind the controls to a BindingSource proxy object, rather than binding controls directly to the data table? The reason is because the control can bind to the proxy even if the data in the table is not currently available. For instance, perhaps the data is going to be derived from a call to a Web service, which will not happen until long after the initialization is complete, or until the user types in his password or presses a button to start the database connection.

The proxy object is created when the customization starts up and controls can be bound to it even if there is no real data available. When the real data is available, the binding source updates the controls. It is essentially just a thin shim that makes it easier to set up bindings before all the data is available.

As you saw in the examples, multiple controls can share the same binding source and therefore have the same currency. That is, when one control is updated, every other control linked to the same binding source is also updated automatically. Controls on different worksheets can share binding sources and thereby share currency. It is easy to change the currency and move to another record in the database by using the MoveNext and MovePrevious methods on the BindingSource object.

Data-Bindable Controls

The last piece of the data binding puzzle is the host control (such as a Word bookmark or Excel range) or Windows Forms control on the spreadsheet or document that actually displays the bound data. There are two flavors of data-bindable controls: simple and complex. Controls that can bind a single datum to a particular property are simple-data-bindable. Controls that can bind multiple rows and/or columns are complex-data-bindable.

In the preceding examples, the list object in Excel and the combobox and DataGrid Windows Forms controls are complex-data-bindable; the list object and DataGrid display multiple rows and columns from a table, and the combobox displays multiple rows from a single column. The bookmark and named range controls by contrast are simple-data-bindable; only a single datum is bound to the Value property of the named range.

All the Windows Forms controls are simple-data-bindable, as are almost all of the Word and Excel host items and host controls. (There is one exception: the Word XMLNodes host control is neither simple- nor complex-data-bindable.) Of the host items and host controls, only the Excel list object is complex-data-bindable.

Special Considerations

It is easy to use data binding to move the data one way into the document or workbook. But to get the data back from Excel or Word to the database (presuming you want the user to be able to edit the data and push it back to the database) requires some extra steps.

First of all, you need to use the WriteValue method available on any simple data-bound host control or Windows Forms control. You do this by writing code like this:

myNamedRange1.DataBindings.Item(0).WriteValue()

You typically want to write this code within a change event for the named range or bookmark or at some other point before you are going to write back to the database.

This line of code gets the changed data from the named range in this case back to the DataSet. Next, you need to tell the bound DataSet to accept changes made to it (because of WriteValue) by calling AcceptChanges on the DataSet:

MyDataSet.AcceptChanges()

Alternatively, if you are using a BindingSource object, you can call EndEdit on the BindingSource object.

Finally, now that the change has been WriteValue'd back to the DataSet from the data-bound named range, and the change has been Accepted by the DataSet, you need to write the modified DataSet back to the actual database by using the table adapter that was used to fill the DataSet and calling Update on the table adapter:

Me.MyTableAdapter.Update(Me.MyDataSet.MyTable)

Note that you don't have to do the WriteValue and AcceptChanges steps if you are using a data-bound Excel list object—it takes care of these two steps automatically because it is doing complex data binding rather than simple data binding.

Conclusion

Visual Studio 2005 Tools for Office provides powerful tools for data binding that, for the most part, use the same model used by Windows Forms. It allows you to use your existing knowledge of data sources and data binding in Visual Studio 2005 to create data-bound Office solutions without having to understand the complex object models of Excel and Word.

Eric Carter and Eric Lippert are developers at Microsoft. They discuss Visual Studio Tools for Office in depth in the books Visual Studio Tools for Office: Using C# with Excel, Word, Outlook, and InfoPath (Addison-Wesley Professional, 2005) and Visual Studio Tools for Office: Using Visual Basic 2005 with Excel, Word, Outlook, and InfoPath (Addison-Wesley Professional, 2006). Portions of this article have been adapted from the book Visual Studio Tools for Office: Using C# with Excel, Word, Outlook, and InfoPath with permission from Addison-Wesley.