Export (0) Print
Expand All

Schemas in ADO.NET 2.0

 

Bob Beauchemin
DevelopMentor

August 2004

Applies To:
   Microsoft ADO.NET 2.0
   Microsoft Visual Studio 2005
   C# programming language

Summary: Learn about the enhanced support in ADO.NET 2.0 for accessing metadata from your data sources. (13 printed pages)

Download the associated SchemasSample.exe sample code.

Contents

An In-Depth Look at the New Common Metadata API
Who Needs Metadata, Anyway?
What Metadata Can I Get?
Restrictions
DataSourceInformation
Customizing and Extending the Metadata
User Customization
Conclusion: Final Fragments of Metadata Support

An In-Depth Look at the New Common Metadata API

In my previous article, I pointed out that Visual Studio 2005 Server Explorer now uses a dialog box containing a list of .NET data providers (rather than OLE DB providers) to prompt for connection information. When you decide on a connection string and add a Data Connection, each Data Connection also displays a tree of information about the database objects (like tables, views, and stored procedures) visible directly through the connection. But where does this information come from? Is Visual Studio hard-coded to produce this information only for certain data providers, leaving me with an empty node if I write my own data provider or buy one from a third party? No, not in Visual Studio 2005. All of this good information is brought to you courtesy of the new Schema API in ADO.NET 2.0. I don't know if this is how Visual Studio does it, but here is code to get a list of tables in a database using the new APIs.

// uses a ADO.NET 2.0 named connection string in config file
// uses ADO.NET 2.0 ProviderFactory and base classes
// see previous article 
public static void GetListOfTables(string connectstring_name)
{
  ConnectionStringSettings s =  
    ConfigurationSettings.ConnectionStrings[connectstring_name];
  DbProviderFactory f = DbProviderFactories.GetFactory(s.ProviderName);
  using (DbConnection conn = f.CreateConnection())
  {
    conn.ConnectionString = s.ConnectionString;
    conn.Open();

    DataTable t = conn.GetSchema("Tables");
    t.WriteXml("tables.xml");
  }
} 

Who Needs Metadata, Anyway?

Metadata is a part of every data access API. Although the primary consumers of metadata are tools like Visual Studio or code generators like DeKlarit, they are not the only users. Application package designers may allow end-users to customize an application by adding new tables or new columns to existing tables. When end-users change the database schema like this, a general-purpose query and modification tool can use metadata to include the users' new tables in maintenance, backup, and other application functions just like they were built-in tables that ship with the application. Programmers can use metadata to write their own custom classes that derive from System.Data.Common.DbCommandBuilder and build insert, update, and delete commands for use with the DataSet. Builders of multi-database applications (that is, applications designed to run on the user's choice of database) can use metadata to maintain a common code base as much as possible, optimizing the data access code when needed.

It's better to expose the metadata through a generic metadata API than to have each consumer use the database-specific API. That way, tool writers can maintain a more manageable code base. Such an API must be very flexible as well, because there are four obstacles to consider when writing a generic API to expose metadata.

  1. The metadata collections and information differ between databases. For example, SQL Server users might want to expose a collection of Linked Servers, whereas Oracle users might be interested in information about Oracle Sequences.
  2. The underlying system tables in which common database metadata is stored is different, not only in different database products, but even in different versions of the same database. For example, SQL Server 2005 exposes its metadata using new tables under a "sys" schema (for example, sys.tables) while previous versions of SQL Server use metadata tables, such as sysobjects, to store the same data.
  3. Different programs may have different views of metadata. As an example, many programmers complain about long lists of tables in an Oracle database because most metadata APIs mix "system" tables with user tables. They'd like to have a short list that consists only of tables they defined.
  4. Whether or not to support metadata at all, and how much metadata to provide, should be entirely up to the provider writer.

Most database APIs provide a standard set of metadata that all providers must support and allow provider writers to add new metadata tables. This is consistent with the approach taken by ANSI SQL standard. The part of the standard that addresses this is the Schema Schemata (INFORMATION_SCHEMA and DEFINITION_SCHEMA), part 11. The ANSI SQL INFORMATION_SCHEMA defines of standard set of metadata views to be supported by a compliant database. But even the spec needs to have a way to address the points above. It states:

"Implementers are free to add additional tables to the INFORMATION_SCHEMA or additional columns to the pre-defined INFORMATION_SCHEMA tables."

As an example of a data access API consistent in concept with the ANSI SQL standard, OLE DB defined a series of metadata that it called "Schema Rowsets". It started with a predefined set that roughly followed the INFORMATION_SCHEMA and added OLE DB-specific columns to each one. ADO.NET 2.0 provides an even more powerful and flexible mechanism to expose metadata.

What Metadata Can I Get?

ADO.NET 2.0 permits a provider writer to expose five different types of metadata. These main metadata "meta-collections" or "categories" are enumerated in the class System.Data.Common.DbMetaDataCollectionNames.

  • MetaDataCollections—A list of metadata collections available.
  • Restrictions—For each metadata collection, an array of qualifiers that can be used to restrict the scope of the schema information requested.
  • DataSourceInformation—Information about the instance of the database the data provider references.
  • DataTypes—A set of information about each data type the database supports.
  • ReservedWords—Reserved words for that database's query language. Usually "query language" equates to a SQL dialect.

MetaDataCollections is the name for the INFORMATION_SCHEMA collections, like "Tables", "Columns", or "PrimaryKeys". Using DbConnection.GetSchema, however, these metadata categories are also considered metadata. What this means in terms of code is that these collections can be obtained like ordinary metadata.

// gets information about database Views
Table t1 = conn.GetSchema("Views");
// gets information about collections exposed by this provider
// this includes the five "meta-collections" described above
Table t2 = conn.GetSchema(DbMetaDataCollectionNames.MetaDataCollections);
// gets information about the Restrictions meta-collection
Table t3 = conn.GetSchema(DbMetaDataCollectionNames.Restrictions);
// No argument overload is same as asking for MetaDataCollections
Table t4 = conn.GetSchema();

Two of the five meta-collections deserve further explanation.

Restrictions

Restrictions can be used to limit the amount of metadata returned. If you are familiar with OLE DB or ADO, the term "restriction" means the same thing in those APIs. As an example, let's use the MetaDataCollection "Columns", which is the set of Column names in tables. This collection can be used to get all of the Columns in all tables. However, the set of columns requested can be restricted by database name, by owner/schema, or by table. Each metadata collection can have a different number of possible restrictions and each restriction can have a default. Following along with our example, here's an XML representation of the restrictions for the Columns metadata:

Listing 1. Restrictions on the Columns Collection (XML format)

<Restrictions>
  <CollectionName>Columns</CollectionName> 
  <RestrictionName>Catalog</RestrictionName> 
  <RestrictionDefault>table_catalog</RestrictionDefault> 
  <RestrictionNumber>1</RestrictionNumber> 
</Restrictions>
<Restrictions>
  <CollectionName>Columns</CollectionName> 
  <RestrictionName>Owner</RestrictionName> 
  <RestrictionDefault>table_schema</RestrictionDefault> 
  <RestrictionNumber>2</RestrictionNumber> 
</Restrictions>
<Restrictions>
  <CollectionName>Columns</CollectionName> 
  <RestrictionName>Table</RestrictionName> 
  <RestrictionDefault>table_name</RestrictionDefault> 
  <RestrictionNumber>3</RestrictionNumber> 
</Restrictions>
<Restrictions>
  <CollectionName>Columns</CollectionName> 
  <RestrictionName>Column</RestrictionName> 
  <RestrictionDefault>column_name</RestrictionDefault> 
  <RestrictionNumber>4</RestrictionNumber> 
</Restrictions>

Restrictions are specified using an overload of DbConnection.GetSchema. The restrictions are specified as an array. You can specify an array as large as the entire restrictions collections or a subset array, because "RestrictionNumbers" usually progress from least restrictive to most restrictive. Use a null value (not database NULL, but .NET null, or Nothing in Visual Basic .NET) for restriction values you want to leave out. For example:

// restriction string array
string[] res = new string[4];

// all columns, all tables owned by dbo
res[1] = "dbo";
DataTable t1 = conn.GetSchema("Columns", res);

// clear collection
for (int i = 0; i < 4; i++) res[i] = null;
// all columns, all tables named "authors", any owner/schema
res[2] = "authors";
DataTable t2 = conn.GetSchema("Columns", res);

// clear collection
for (int i = 0; i < 4; i++) res[i] = null;
// columns named au_lname 
// all tables named "authors", any owner/schema
res[2] = "authors";  res[3] = "au_lname";
DataTable t3 = conn.GetSchema("Columns", res);

// clear collection
for (int i = 0; i < 4; i++) res[i] = null;
// columns named au_lname 
// any tables, any owner/schema
res[3] = "name";
DataTable t4 = conn.GetSchema("Columns", res);
   

You need not specify the entire array of restrictions. In the case above, where you'd like to see only columns in tables owned by "dbo", you can specify an array with only two members instead of all four. Note also that specifying an empty string as a restriction is different than specifying a null (Nothing in Visual Basic .NET) value. You do not need to memorize the restrictions; you can always query for them, just like any other collection. The "Restrictions" collection itself does not allow restrictions, but because the information is fetched into a DataTable, you can use a DataView to provide similar functionality, as shown below.

DataTable tv = conn.GetSchema(DbMetaDataCollectionNames.Restrictions);
DataView v = tv.DefaultView;
// show restrictions on the "Columns" collection, sorted by number
v.RowFilter = "CollectionName = 'Columns'";
v.Sort = "RestrictionNumber";
for (int i = 0; i < tv.Count; i++)
  Console.WriteLine("{0} (default){1}",
    tv.Rows[i]["RestrictionName"], 
    tv.Rows[i]["RestrictionDefault"]);

DataSourceInformation

The DataSourceInformation collection provides information about the current instance of the database (data source) for query builders. Although this collection can contain anything the provider desires, in the Microsoft providers (SqlClient, OracleClient, OleDb, Odbc) this collection contains similar information. Here's the information you get by default.

Table 1. DataSourceInformation in Microsoft providers

ValueFormat/Meaning
CompositeIdentifierSeparatorPatternSeparator for multipart names, (e.g., the dot in pubs.dbo.authors)
DataSourceProductNameDatabase name
DataSourceProductVersionDatabase version. Note that this is the version of the database instance currently being accessed by the DbConnection.
DataSourceProductVersionNormalized 
GroupByBehaviorEnumeration, System.Data.Common.GroupByBehavior
IdentifierPatternRegular expression string
IdentifierCaseEnumeration, System.Data.Common.IdentifierCase
OrderByColumnsInSelectBoolean, should you ORDER BY the columns in a SELECT statement by default
ParameterMarkerFormatIndicates whether parameter markers begin with a special character (e.g., @ for T-SQL)
ParameterMarkerPatternRegular expression string, used to create parameters
ParameterNameMaxLengthMaximum length of a parameter
ParameterNamePatternRegular expression string, used to create parameters
QuotedIdentifierPatternRegular expression string, used to quote identifiers
QuotedIdentifierCaseEnumeration, System.Data.Common.IdentifierCase
StatementSeparatorPatternRegular expression string
StringLiteralPatternRegular expression string
SupportedJoinOperatorsEnumeration, System.Data.Common.SupportedJoinOperators

More than enough information to produce SQL for a particular database dialect, don't you think? There is just one more piece of information I'd like, and that's whether the provider uses named parameters or positional parameters in parameterized queries. I mentioned named and positional parameters as two ways to write parameterized commands in my last article about writing provider-independent code.

Customizing and Extending the Metadata

Now that we've seen the base metadata that is provided and can find our way around DbConnection.GetSchema(), let's look at the ways provider writers can customize metadata using a simple declarative format, and how programmers can hook into that format. This discussion ties back to our metadata complications at the beginning of the article: how to provide database-version-independent metadata and how to deal with the fact that different customers may want different views of the same metadata.

First, let's point out that metadata support is completely optional. Providers do not have to support DbConnection.GetSchema; this method can throw a NotSupportedException. In addition, only the MetaDataCollections category is required if the provider writer chooses to support DbConnection.GetSchema. Providers can choose not to supply any or all of the other four categories of information.

Next, each provider can expose different information for the same metadata collection. The structure of the Tables collection, for example, is entirely up to the provider writer. As an example, the SqlClient provider exposes four information items in the Tables collection: table_catalog, table_schema, table_name and table_type. The OracleClient provider exposes only three information items (OWNER, TABLE_NAME, and TYPE) because Oracle databases do not contain multiple catalogs. The number of restrictions and restriction items can be different for each provider. Using the case of Tables again, the SqlClient provider supports four restrictions, and the OracleClient provider supports two. The restrictions also need not occur in any specific order. So there is no mandated metadata structure, amount of metadata, or order of metadata, as there is in the OLE DB and ODBC APIs. The provider is free to expose whatever metadata is pertinent. However, if a specific application (like Visual Studio) desires that the metadata be consistent among all .NET data providers used in an application, it can obtain this behavior, too, by overriding the provider's normal behavior. We'll discuss this further in the User Customization section.

Provider writers can hard-code metadata logic directly into their providers, with each provider-writer using a potentially different internal algorithm for obtaining similar metadata. This is the way it's been done in the past, for example, in implementing the OLE DB ISchemaRowset method. In ADO.NET 2.0, however, there are some base classes available for provider writers in the System.Data.ProviderBase namespace. The four available Microsoft providers use these base classes, so they all implement schemas similarly. I'll use this implementation for exposition, hoping that major provider-writing players like DataDirect Technologies and other provider-writers will use it, too.

The base class for exposing metadata is DbMetaDataFactory. Providers that implement a subclass of it use an XML file to define their metadata-fetching behavior. These files are embedded resources in System.Data.dll and System.Data.OracleClient.dll; you can look at the raw XML files by running ILDASM.exe from the command line.

>ildasm.exe System.Data.dll /out:dummy.il

Looking at the XML resource files produces from ILDASM peels another layer off the onion. The file enumerates the collections that are supported and the information contained in each meta-collection (through the schema) and appears to be the output of the DataSet.WriteXml method using the DataSet.WriteXml(XmlWriteMode.WriteSchema) overload. The most interesting bits are the MinimumVersion/MaximumVersion elements in all the meta-collections except DataSourceInformation and the PopulationMechanism/PopulationString subelements in the MetaDataCollections elements.

Using MinimumVersion/MaximumVersion allows the provider writer to specify which metadata queries to execute for different versions of the database. By using multiple elements for a single MetaDataCollection, you can make GetSchema act differently for different versions of the database. As an obvious example, you could use different versions for SQL Server 2005 than for previous versions of SQL Server. Here's an example of using MinimumVersion from the SQL Server metadata resource, System.Data.SqlClient.SqlMetaData:

Listing 2. Metadata Entry for Data Type XML in Data Types Collection

<DataTypes>
<TypeName>xml</TypeName>
<ProviderDbType>25</ProviderDbType>
<ColumnSize>2147483647</ColumnSize>
<DataType>System.String</DataType>
<IsAutoIncrementable>false</IsAutoIncrementable>
<IsCaseSensitive>false</IsCaseSensitive>
<IsFixedLength>false</IsFixedLength>
<IsFixedPrecisionScale>false</IsFixedPrecisionScale>
<IsLong>true</IsLong>
<IsNullable>true</IsNullable>
<IsSearchable>true</IsSearchable>
<IsSearchableWithLike>false</IsSearchableWithLike>
<MinimumVersion>09.00.000.0</MinimumVersion>
<IsLiteralSupported>false</IsLiteralSupported>
</DataTypes>

This defines information about the SQL Server data type XML. The MinimumVersion indicates that this data type is only available when using SQL Server 2005. If you ask SqlConnection.GetSchema for a list of data types that the database supports, only SQL Server 2005 databases (SQL Server 2005 is version 9, the current beta 2 version is 09.00.852.2) will report that they support the XML data type.

For collections usually exposed by the INFORMATION_SCHEMA (like Tables, Views, or Stored Procedures), PopulationMechanism and PopulationString are where things get started. There are three PopulationMechanisms used in this implementation: DataTable, SQLCommand, and PrepareCollection. DataTable is used to populate the meta-collections. Using DataTable means that the information used to populate the collection is in the XML resource file itself. In each case, the PopulationString is the name of the DataTable produced when the XML resource file is loaded into a .NET DataSet. SQLCommand means that the provider will use a DbCommand instance to issue the command against the database. If you look at one of the PopulationStrings of a collection produced by a SQLCommand:

Listing 3. Entry for Databases (Catalogs) in SQL Server - MetaDataCollection

  <MetaDataCollections>
    <CollectionName>Databases</CollectionName>
    <NumberOfRestrictions>1</NumberOfRestrictions>
    <NumberOfIdentifierParts>1</NumberOfIdentifierParts>
    <PopulationMechanism>SQLCommand</PopulationMechanism>
    <PopulationString>select name as database_name, dbid, crdate as 
create_date from master..sysdatabases where name = {0}</PopulationString>
  </MetaDataCollections>

It's fairly easy to deduce that String substitution will be applied to the "base query" when restrictions are used in DbConnection.GetSchema. If no restrictions are specified that predicate will effectively be stripped out of the query.

The provider writer can use a custom mechanism when the value of PopulationMechanism is PrepareCommand. There is a PrepareCommand method of DbMetaDataFactory that, if overridden by the provider writer, can be coded to use whatever custom semantics the provider chooses. This mechanism is used in SqlClient to produce the DataTypes meta-collection. The SqlMetaDataFactory subclass implementation of PrepareCommand first reads the built-in data types supported by SQL Server from the DataTable, as with other meta-collections, and then uses custom logic to add user-defined types to the collection if the database is SQL Server 2005. (Note: SQL Server 2005 can expose .NET classes as user-defined types. See Chapter 5 of A First Look at SQL Server 2005 for Developers for more information.)

User Customization

In addition to the provider customization mechanism, there is also a hook that allows programmers to customize schema information on a per-application basis! Before loading the embedded resource, the DbConnectionFactory CreateMetaDataFactory will consult the application configuration file. Each provider can implement CreateMetaDataFactory to retrieve the XML stream for its DbMetaDataFactory in whatever way it chooses, but the four Microsoft providers follow a common pattern. Each Microsoft provider will look for an application configuration setting named after the provider itself (e.g., system.data.sqlclient). In this setting element you can add or remove name-value pairs. DbMetaDataFactory looks for a name "MetaDataXml". The value corresponding to the special name is the name of a file. This is a simple file name—the file must exist in the CONFIG subdirectory of the location where .NET is installed. This is the directory where machine.config and the security configuration settings live. This file must contain the entire set of Schema configuration information, not just the changes.

You can use this mechanism, for providers that support it, for many reasons. For example, you could change the schema queries in the OracleClient provider to use the USER catalog views rather than the ALL catalog views. Because the USER views don't contain information about internal database tables, the list of Tables, for example, will be much shorter and easier to work with. Another example might consist of coding out metadata XML files for all .NET data providers that give you a consistent standard set of metadata, possibly one that corresponds exactly to the SQL-99 INFORMATION_SCHEMA views. This might be just right for your application.

A more concrete example would be if I wanted to expose information about SQL Server Service Broker metadata collections in SQL Server 2005. These collections might include QUEUEs, SERVICEs, CONTRACTs, and message types. I would start with the embedded XML resource file and embellish it with information on my new collections. If the file name was SQLBrokerAware.xml, I would install the file and my application configuration file would look like this:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.data.sqlclient>
    <settings>
       <add name="MetaDataXml" value="SQLBrokerAware.xml"></add>
    </settings>
  </system.data.sqlclient>
</configuration>

That's all there is to it! Using that setup, I could write code in which Service Broker metadata is part of the built-in metadata available to the client. The code might look like this for all of the Queues:

using (SqlConnection conn = new SqlConnection(connstring))
{
  conn.Open();
  // this includes Service Broker metadata collections
  Table t = conn.GetSchema(DbMetaDataCollectionNames.MetaDataCollections);
  // get all the queues in my database
  Table queues = conn.GetSchema("Queues");
}

Very cool indeed! A code sample that adds Service Broker metadata is included with this article. Although this is a very powerful feature, it does have the capability to be abused. Remember that you'll need to distribute the metadata XML file with every application that uses it and convince the System Administrator to install it in the CONFIG directory for you. And you'll need to maintain it with each new version of the provider that ships. Because one of the reasons for generic metadata APIs is to have consistent metadata across databases and applications, this feature should not be used gratuitously. Note also that you cannot provide a custom implementation of PrepareCommand at this time.

As a final remark on customization, you might have guessed that customization and resources work differently with the bridge providers for OLE DB and ODBC. When you use these providers, the default Odbc or OleDb XML resource files are used, and you can customize not only the main Odbc or OleDb schema behaviors, but also customize the behavior on a per-provider basis. If you want to specify your own providers or drivers, the name attribute used in adding/removing settings subelements would not be MetaDataXml, but instead would be [providershortname]:MetaDataXml. If you want your file to be the default for the OleDb or Odbc data provider, you can even specify a name of defaultMetaDataXml.

Conclusion: Final Fragments of Metadata Support

In closing, I'd just like to mention two other metadata extensions that are not exposed through DbConnection.GetSchema. The DbCommandBuilder includes two properties, QuoteIdentifier and UnquoteIdentifier, which permit you to customize identifiers in commands that the CommandBuilder builds. As an example, in SQL Server you can use double-quote (") or brackets ('[' and ']') to quote identifiers, depending on your session settings. Finally, the class SqlMetaData in the System.Data.Sql namespace is used to expose the metadata in SQL Server DataReaders and allows you to set this metadata in Parameters used by SqlCommand. This metadata is similar in concept to, though not anywhere near as detailed as, the metadata exposed in the DataTypes meta-collection. SqlMetaData is useable by both the SqlClient data provider and the in-database SqlServer data provider included with SQL Server 2005. SqlMetaData augments the rowset metadata exposed currently by the method SqlDataReader.GetSchemaTable().

At this point you'll likely agree that the ADO.NET 2.0 metadata infrastructure is the most powerful, flexible, and customizable you've seen. It expands the picture of ADO.NET as a complete object-oriented database API. Tool vendors, provider writers, and users of Visual Studio data tools will be dancing in the streets—see you there.


Bob Beauchemin is an instructor, course author, and database curriculum course liaison for DevelopMentor. He has over twenty-five years of experience as an architect, programmer, and administrator for data-centric distributed systems. He's written articles on ADO.NET, OLE DB and SQL Server for Microsoft Systems Journal and SQL Server Magazine and others, and is the author of A First Look at SQL Server 2005 for Developers and Essential ADO.NET.

Show:
© 2015 Microsoft