Export (0) Print
Expand All

Exposing Proprietary Data the .NET Way

 

Dino Esposito
Wintellect

September 11, 2001


Download Dida09132001_source.exe.

In the past few years, a lot of companies have invested a lot of money in the OLE DB technology. OLE DB has been the technology that turned into the concrete programming calls that the vision of a universal data access strategy theorized in the UDA specification. OLE DB is built around the idea of using a suite of COM interfaces to read and write the contents of data sources irrespective of their relational, hierarchical, or flat architecture.

Perhaps the C++ oriented design that OLE DB came up with is a bit too complex, but it definitely provides great flexibility when it comes to exposing proprietary data in a standardized way. OLE DB supplies a common COM-based API by means of which consumer applications talk to data provider modules without the need to know about internal details. Each provider encapsulates a particular data store, like a commercial DBMS, a system component like Active Directory™, or more simply, the manager of proprietary data with a custom format.

The advent of .NET somehow pushed all COM-based technologies to the side, and OLE DB was no exception. All .NET applications that want to access data through OLE DB providers must jump out of the Common Language Runtime (CLR) environment and rely on interoperability services. All this results in a certain overhead that never benefits the overall application performance. To solve this issues at the root, .NET introduced managed providers, which share much the same pattern of OLE DB providers, while trying to do everything within the context of the CLR.

The .NET Framework comes with the OLE DB managed provider, which is meant to be a bridge that gives you immediate access to all the existing OLE DB providers. This component saves your current investments, as applications can continue successfully calling into existing OLE DB providers. The point, though, is another.

In the pre-.NET era, OLE DB providers were the only option you had to make your proprietary data publicly available—that is, available in a widely accepted format. For relatively simple data formats (like comma-separated files), you also had the option of using the OLE DB Simple Provider Toolkit. The Simple Provider Toolkit is a tool to write, even in Microsoft® Visual Basic®, shrink-wrapped OLE DB providers with a limited number of functionalities.

In .NET, exposing proprietary data requires a more thoughtful choice, mostly because you have multiple and equally powerful options from which to choose.

The Nature of the Data

To expose custom data in a format that can be easily consumed from a .NET application, you can take any of the following routes:

  • Develop a made-to-measure XML schema and a class around it
  • Write an OLE DB provider
  • Write a simple .NET managed provider
  • Architect and develop a full-fledged .NET managed provider

Where should you concentrate your efforts? That mostly depends on the inner nature of the data you are providing and the audience you expect to reach. Each of the approaches above has pros and cons, and choosing the one that best fits your data is—guess what—up to you.

Made-to-Measure XML Schemas

As Aaron Skonnard clearly explains in The XML Files column (September 2001 issue of MSDN Magazine), today XML lets you obtain the same results of "universality" as OLE DB. In addition, XML is free from the problems that prevented OLE DB from becoming a recognized and widely accepted technology for data access and data sharing. XML is terrifically simple, human-readable, human-authorable, and, last but not least, really ubiquitous and available on all platforms. If you publish data as XML, chances are that anyone, at any time, and on any platform, can read and process it in a seamless way.

In .NET, you have a lot of classes and facilities to manipulate XML documents—from XmlReader to XmlDataDocument, and from XmlWriter to XPathNavigator. Using a public XSD schema can make your document more easily understood. As a general recommendation, though, bear in mind that the more advanced XML-related technologies you use, the more you expose yourself to the risk of a potential restricted audience. The parser used to process your documents on a certain platform might not be updated to support XSD schemas or namespaces and so on.

We've talked about it; now let's look at some examples. Let's see how to grab some directory information and expose it as XML. The idea is to make available a DirectoryListing class that returns in various formats—XML string, DataTable, disk file—all the subdirectories below a given path with some information like name, last accessed time, and any user comments. The DirectoryListing class looks like this:

class DirectoryListing
{
   private String m_path = "";

   // Constructors
   public DirectoryListing()
   {
   }
   public DirectoryListing(String path)
   {
      m_path = path;
   }

   // Various overloads for the Fill method that internally uses 
// the DirectoryInfo class to walk through subfolders

   public String Fill()
   {
      return Fill(m_path);
   }

   // Creates a DataSet and returns its XML representation
public String Fill(String path)
   {
      DirectoryInfo dir = new DirectoryInfo(path);
      DataTable dt = PrepareDataTable("Folder");

      DataSet ds = new DataSet("DirectoryListing");      
      ds.Tables.Add(dt);
      FillDataTable(dt, dir);

      return ds.GetXml();
   }

   // Fills the supplied DataTable object with directory info 
   public void Fill(String path, DataTable dt)
   {
      DirectoryInfo dir = new DirectoryInfo(path);
      PrepareDataTable("Folder", dt);
      FillDataTable(dt, dir);
      return;
   }

   // Creates a DataSet and dumps its XML representation to the 
// specified file
   public void Fill(String path, String filename)
   {
      DirectoryInfo dir = new DirectoryInfo(path);
      DataTable dt = PrepareDataTable("Folder");

      DataSet ds = new DataSet("DirectoryListing");      
      ds.Tables.Add(dt);
      FillDataTable(dt, dir);

      StreamWriter sw = new StreamWriter(filename);   
      ds.WriteXml(sw);
      sw.Close();
      return;
   }

// Private members go here...
}

Basically, the class lets you specify a path and obtain a tabular structure with directory information. This information is available through a simple XML schema or through a DataTable object. Of course, XML is much more useful if you want or need to share this information with non-.NET components on the Microsoft® Windows® platform, or any other platforms for that matter.

In this case, the XML schema is simple, but you can make it complex and rich at will.

<DirectoryListing>
   <Folder>
      <FolderName>...</FolderName>
      <LastAccessed>...</LastAccessed>
      <Comment>...</Comment>
   </Folder>
   :
</DirectoryListing>

The DirectoryListing class can also return data through a versatile, but .NET-specific class like DataTable.

The DataTable structure provides unmatched flexibility if the data must be used only within .NET applications. A DataTable object, in fact, is a sort of super-array class and, more importantly, can be easily bound to data-bound controls like the DataGrid, the DataList, or the Repeater. The code snippet below shows how to get a directory listing and display it through a datagrid in an ASP.NET page.

<%@ Page Language="C#" %>
<%@ Assembly Src="DirectoryListing.cs" %>
<%@ Import Namespace="System.Data" %>
:
public void OnGoGetData(Object sender, EventArgs e)
{
   DataTable dt = new DataTable();
   DirectoryListing dl = new DirectoryListing();         
   dl.Fill(txtPath.Text, dt);

   grid.DataSource = dt;
   grid.DataBind();   
}

The figure below illustrates a typical output of this code.

Figure 1. A datagrid bound to the DataTable returned by the DirectoryListing class

Incidentally, the folder comment is taken from the Infotip entry in the desktop.ini file, should one be found in the folder.

Simple .NET Managed Providers

You probably noticed that the key method in the DirectoryListing class is called Fill, like the key method of data adapter classes. In a data adapter class, the method Fill is responsible for the actual data provision. It opens a channel with the underlying data source and reads the needed number of rows. These rows are then packed into a DataTable object, within or outside a container DataSet.

A plain class to expose data works fine, and once you have compiled it into an assembly, it instantly becomes a tool you can distribute and deploy quite easily. A class, though, is a rather general-purpose software tool. A more consistent, and even elegant, approach to expose proprietary data through standard tools and formats in .NET is to write a simple .NET managed provider.

The functionality that a .NET managed provider supplies generally falls into two distinct categories:

  • Support of the DataSet class through the implementation of the IDataAdapter interface
  • Support of connected data access scenarios, which includes classes representing connections, commands, transactions, and parameters

In other words, in a .NET data provider, the internal mechanisms for connected and disconnected data manipulation are neatly separated. As in OLE DB, you may have simple and complex providers. The difference between the two is in the number of the interfaces that are implemented. The simplest flavor of a .NET managed provider interacts with callers only through the DataSet object, both in reading and writing.

A simple .NET managed provider has a structure that is functionally similar to the DirectoryListing class examined above. In terms of the internal structure, though, there's a significant difference. The class that outfits a simple managed provider must implement the IDataAdapter interface and, subsequently, must expose a number of methods and properties. Let's consider extending the DirectoryListing class to make it work as a managed provider. For one thing, the class should have at least two constructors. The default one takes no argument and initializes the internal state of the object in a default way. A second constructor should take one or more arguments being the command text to run against the data source.

public class DirectoryDataAdapter : IDataAdapter
{
   private String m_strPath;
   private DirectoryInfo m_di;

   // Constructors
   public DirectoryDataAdapter()
   {
      m_strPath = "c:\\";
   }

   public DirectoryDataAdapter(String strPath)
   {
      m_strPath = strPath;
   }

   :
}

In this case, the directory path is the command text, and the private member of type DirectoryInfo represents the connection with the file system through which information will be retrieved. Notice that in Visual Basic, the class declaration would have been

Public Class DirectoryDataAdapter 
Implements IDataAdapter
:
End Class

The IDataAdapter interface comprises the following properties:

Property Description
MissingMappingAction Indicates how the class should behave when a column is unmapped (namely it has no entry in the table of mappings).
MissingSchemaAction Indicates how the class should behave in presence of missing source tables, columns, and relationships.
TableMappings Returns a collection object (empty by default) that contains mappings between the column names used by the data source and the column names used by the DataSet.

The MissingMappingAction property determines the action that occurs when a column name used by the caller is not mapped to a data source column in the TableMappings collection. The MissingMappingAction enumeration defines three feasible actions: an exception is thrown, the column is ignored, or, as happens by default, the column is processed using the original data source name. The role of MissingMappingAction is strictly related to the TableMappings collection. TableMappings lets you use custom names for columns that do not match the real names in the data source. This association is called mapping. TableMappings is empty by default, which means that a column is processed with its original name until a mapping is created.

MissingSchemaAction specifies the action to take when someone is trying to add data to the DataSet, but the required DataTable or DataColumn is missing. In this case, by choosing one of the values in the MissingSchemAction enumeration, you can decide to raise an error, ignore the missing tables and columns, add the necessary tables and columns, or even complete the schema with key information.

The IDataAdapter interface comprises the following methods:

Method Description
Fill Adds or refreshes rows in the DataSet. In your implementation, you should define the name of the tables used and created.
FillSchema Adds a table to the DataSet with the needed column schema.
GetFillParameters Retrieves the parameters that the user embedded in the command text.
Update Executes the operations necessary to update the data source.

For simple data formats, you can have all the methods but Fill; simply raise the NotSupportedException exception. For example,

public int Update(DataSet ds)
{
   throw new NotSupportedException();         
}

Similarly, you don't need mappings or automatic key manipulation for simple formats . Hence, all the properties could have an empty implementation. Generally speaking, when writing a managed provider, you should avoid no-op implementation. In case you don't want to provide a certain, optional functionality, then throw the NotSupportedException exception instead of inserting dummy code.

Thus, the implementation of a simple .NET managed provider results in a class that has a handful of "empty" functions and just one working method—Fill. The source code for method Fill closely resembles the core code of the DirectoryListing class. The IDataAdapter interface requires Fill to have at least one version that returns an integer and takes a DataSet parameter. Of course, you are free to define as many overloads as you need.

public int Fill(DataSet ds)
{
   DataTable dt = new DataTable("FolderContents");
   ds.Tables.Add(dt);
   Internal_CreateTable(dt);
   return Internal_FillTable(dt);
}
public int Fill(DataSet ds, String strTableName)
{
   DataTable dt = new DataTable(strTableName);
   ds.Tables.Add(dt);
   Internal_CreateTable(dt);
   return Internal_FillTable(dt);
}

What is interesting to see now is how you use this simple .NET managed provider. The following code in an excerpt from an ASP.NET page that reads a path from a textbox and populates a datagrid with the results.

public void OnLoadData(Object sender, EventArgs e)
{
DirectoryDataAdapter da;
da = new DirectoryDataAdapter(txtCommand.Text);
   
   DataSet ds = new DataSet();
   da.Fill(ds, "MyTable");

   // Display the data
   grid.DataSource = ds.Tables["MyTable"];
   grid.DataBind();
}

You use the DirectoryDataAdapter class in the same way that you take advantage of the Microsoft® SQL Server data adapter in any ADO.NET application. You create a new instance of the class specifying the arguments required by the adapter's constructor. For simple providers, you should allow users to indicate the command text through the constructor, rather than setting up made-to-measure, non-standard properties or methods. Once the adapter is up and running, you call its method Fill and pass a DataSet object to be filled. The method Fill will create and add a new table that you can then use as any other .NET collection class. The figure below shows the new data adapter in action.

Figure 2. The new data adapter in action

Exposing Data Stores

Writing a .NET managed provider ties you to the .NET platform. To make your data reach the widest audience, you can either write an old-style OLE DB provider or use XML as the serialization format of your data. An OLE DB provider, no matter whether you make it simple or full-featured, works great in the .NET world, as well as in the Win32 arena. If you need a platform-independent slot-machine that distributes your data to any caller, then go for the XML and Web services.

Being a publicly available URL, a Web service can be reached from any platform where HTTP support is supplied. If the Web service then happens to return data in XML format—that is, another ubiquitous data format—you would probably solve once and for all any issue that concerns exposure of proprietary data.

OLE DB providers, however, remain the most powerful solution to publish your data if you need to reconcile two key requirements—that is, it be widely reachable and provide a standard format. OLE DB is much more than just a data format. OLE DB is a complex specification for software systems built on top of data—basically, data stores. Accessing a data store is a bit more complex, because it requires sophisticated policies for connection, transaction handling, and parameterized commands. OLE DB has been designed with this in mind, and provides you with all the tools you need.

Due to their COM dependency, OLE DB providers are not highly recommended in .NET. Or, at least, they should be avoided if the same data store can be accessed through a full-fledged .NET managed provider. The .NET Framework comes with two distinct tools to access SQL Server 2000 databases—the classes in the System.Data.SqlClient and the System.Data.OleDb namespaces. The former leverages the .NET managed provider for SQL Server; the latter is serviced by the .NET managed provider for OLE DB. The two perform very different tasks. The native .NET provider hooks up SQL Server at the wire level and communicates with it using RPC and optimized tabular data stream (TDS) packets. The generic managed provider for OLE DB relies on the COM Interop services to jump out of the CLR, connect to the COM version of the SQL Server OLE DB provider (SQLOLEDB), and then use its services to get and set data. As you can see, there is a substantial performance advantage in using the native provider. This is a common pattern for data access in .NET. Take it as a general rule; a .NET managed provider always goes faster than an OLE DB provider.

About .NET Managed Providers

A full-featured managed provider implements a number of interfaces, including IDataAdapter, IDbDataAdapter, IDbConnection, IDataReader, and IDbCommand. At the highest level of abstraction, it can be seen as a collection of data source-specific objects that take care of special tasks like opening a connection, executing a command, and returning disconnected data. Instead of communicating through COM interfaces like OLE DB providers, .NET providers talk to client using .NET classes. Furthermore, thanks to the language neutrality of .NET, they can be programmed in the same way irrespective of the language, which is a quantum leap from the world of COM and OLE DB.

In my next column, I'll be exploring the internals of managed providers, and taking a closer look at the newest .NET provider for ODBC drivers.

Dialog Box: Sorting on Relations

In my application, I have a DataSet with two tables put into relation through a DataRelation object. I display the master table in a datagrid and then, when one row gets selected, I access the child records through GetChildRows. All this works fine, yet I have a problem. How can I retrieve the child rows in a sorted way? I tried setting the Sort property on the default view of the child table, but with no result.

I know the issue, but so far I've only been able to work around it. First of all, the problem magically disappears if you happen to get the child table already sorted. Unless you need to determine dynamically which column to sort by, the issue could be closed by simply appending an ORDER BY clause to your SQL query statement. (Incidentally, this brilliantly solved my instance of the problem.)

Another workaround consists of returning a sorted array to the control that actually displays the child rows. Whatever type they are made of, including DataRow objects, arrays can be sorted through a class that implements the IComparer interface. You call the static method Sort and specify the type-specific class. For example,

class DataRowComparer : IComparer
{
   public int Compare(Object x, Object y)
   {
      DataRow drX = (DataRow) x;
      DataRow drY = (DataRow) y;

      return String.Compare(drX["column_name"].ToString(), 
         drY["column_name"].ToString());
   }
}

To sort using this algorythm, do the following:

DataRow[] adr = dr.GetChildRows("relation_name");
DataRowComparer drc = new DataRowComparer();
Array.Sort(adr, drc);
foreach(DataRow tmp in adr)
{...}

In this way, the child rows will be enumerated according to the sort algorythm.

I repeatedly used the expression workaround here. There's a reason for this. The perfect solution for this kind of issue would be the DataViewManager object. You set the Sort property on a table specific view, set the DataViewManager object as the data source property of the list control, and tell to the DataMember property of the control to get data from the given table. It works, but so far only for Windows Forms datagrids.

 

Dino Esposito works for Wintellect, where he takes on ADO.NET and ASP.NET training and consulting. He is the co-founder of VB-2-The-Max, and also contributes the Cutting Edge column to MSDN Magazine. You can reach Dino at dinoe@wintellect.com.


Show:
© 2014 Microsoft