This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

MSDN Magazine

Cutting Edge
SQL Server 7.0 and OLE DB Heterogeneous Queries
Dino Esposito
Code for this article: Cutting0800.exe (47KB)
U

niversal Data Access (UDA) is Microsoft's solution for integrating different data sources into applications. OLE DB provides an API that allows you to expose heterogeneous data sources through a common set of software toolsâ€"specifically COM interfaces. While it is relatively easy to devise a common API to expose proprietary data in a standardized way, creating a general-purpose, universally valid mechanism for querying data is not as easy. In fact, it probably can't be done.
      Structured Query Language (SQL) is as close as the software community has come to this kind of standardization. SQL has been an ISO standard since 1992. Almost all of the best-selling relational database management systems (RDBMS) available today support SQL, but years of real-world experience prove that not even SQL can be considered a definite standard if you move beyond the entry level of the SQL-92 specification. As a result, you don't really have a standard language for data query, even though that is the way SQL is perceived and used.
      SQL is mostly a query language for relational databases. However, that point can be debated now that some Windows Management Instrumentation (WMI) providers ship with a query language that mimics SQL. From the SQL-92 specification, you get a query language that is standardized enough to work with ODBC drivers and relational data. But the UDA specification needs a query language that is universally accepted, as SQL is today. However, it must be more powerful and more broadly supported than the full SQL-92 spec. To make things even more complicated, UDA also addresses nonrelational and hierarchical data types. SQL-92 was not conceived with anything other than relational tables in mind. Data Shaping SQL extensions introduced with ActiveX® Data Objects (ADO) 2.0 is an example of this.
      Data shaping enables the creation of nested recordsets by introducing the chapter data typeâ€"a record field that contains a child recordset. In practical terms, shaping extensions aren't an extension of the SQL language, but a brand new language built on top of SQL and processed by an OLE DB service. To get nested recordsets, you connect to a special OLE DB data provider and use its SHAPE language to query for data. The SHAPE language is not SQL-compliant.
      When you write custom OLE DB providers you may define your own query languageâ€"that is, you can invent the syntax that your module will understand. There are no guidelines on how to do this, and I don't think there could be. Each OLE DB providerâ€"from RDBMS to system services like ADSI or Index Server and from ODBC to custom providersâ€"understands a different language. The lack of a universal query language clearly influences the way you would architect data queries that span across heterogeneous data sources.

Heterogeneous Queries

      A heterogeneous query (also known as a distributed query) involves more than one database server. For example, you can run a query that returns a joined recordset where some records come from a SQL Serverâ„¢ database and others come from an AS/400 machine. Since neither a universal language nor a universal query processor exist today (and probably won't exist in the near future), any heterogeneous query must pass through a module that can understand the overall syntax, identify and run all the subqueries, and put the various pieces together. In short, running a distributed query requires a specialized app or module that behaves like a distributed query coordinator, as opposed to the Distributed Transaction Coordinator (DTC) found in the Microsoft Transaction Services (MTS) architecture and the Windows® 2000 Component Services.
      One application that can act as a distributed query coordinator is SQL Server 7.0 (and higher). Earlier versions of SQL Server allowed access to external data sources mostly through remote stored procedures running on other SQL Server-equipped machines. However, the distributed query mechanism introduced with SQL Server 7.0 is more flexible because now objects in remote data sources (such as tables and views) can be referenced directly in Transact-SQL statements such as SELECT, INSERT, UPDATE, and DELETE. Furthermore, distributed queries rely on OLE DB as the underlying engine. This means that you can also extend queries to access nonrelational data as long as an OLE DB provider exists for the data. The SQL Server 7.0 distributed queries go beyond the traditional relational DBMS systems with SQL query processors. If you have a piece of software capable of exposing data in a tabular format (rowset) through the OLE DB COM interfaces, then you can treat that data as a SQL Server table and combine and process it via a slightly extended version of Transact-SQL.
      The query mechanism built into SQL Server 7.0 and SQL Server 2000 uses a special syntax to recognize a distributed query and is capable of breaking the command text into pieces and running all the resulting subqueries. Architecturally speaking, this is quite a general schema and is not specifically tied to SQL Server as an RDBMS. As mentioned earlier, you can merge data coming from a SQL Server and an Oracle table, but you could also ask SQL Server to retrieve and merge data from heterogeneous sources like Microsoft® Access and a custom OLE DB provider.
      I can't predict Microsoft's future plans for SQL Server, but I certainly do hope that the current SQL Server query processor becomes a separate and standalone component for preparing and running distributed queries.

Linked Servers

      Let's see how the SQL Server 7.0 mechanism for distributed queries actually works, and how to take advantage of these queries in real-world applications. Logical elements called linked servers are central to distributed queries. Figure 1 shows the node in the SQL Server Enterprise Manager where all the existing linked servers are listed.

Figure 1 Linked Servers Node
Figure 1 Linked Servers Node

      A linked server is an element in the SQL Server environment that gathers all the information needed to execute OLE DB commands. The OLE DB providers involved in this operation can reside on different servers. Linked servers offer several advantages. For one thing, they provide remote server access plus the ability to run queries and execute commands on heterogeneous data sources across the enterprise information system. In addition, linked servers provide a uniform interface to work with different data sources (see Figure 2).

Figure 2 Heterogenous Query Architecture
Figure 2 Heterogenous Query Architecture

      When configuring a linked server, you just specify an OLE DB provider and an OLE DB data source. There are two ways to do this: manually through the SQL Server Enterprise Manager or programmatically via a system-provided stored procedure. To create a linked server interactively from the dialog shown in Figure 1, right-click the Linked Servers node and then select the New Linked Server command. Figure 3 shows the New Linked Server dialog.

Figure 3 Creating a New Linked Server
Figure 3 Creating a New Linked Server

      Another way of creating linked server configurations is through the Query Analyzer or any other environment that allows you to run a couple of stored procedures. (An ADO application, for example, is fine as well.) The following code creates a linked server to a Microsoft Access database:

 

  EXEC sp_addlinkedserver 'NWind', '', 
  
'Microsoft.Jet.OLEDB.4.0',
'd:\nwind.mdb'

 

 

      The stored procedure can accept up to seven arguments, as shown here:

 

  sp_addlinkedserver [@server =] 'server' 
  
[, [@srvproduct =] 'product_name']
[, [@provider =] 'provider_name']
[, [@datasrc =] 'data_source']
[, [@location =] 'location']
[, [@provstr =] 'provider_string']
[, [@catalog =] 'catalog']

 

 

The first argument is the name of the new server, and all the others concur to unambiguously identify the data source and the OLE DB provider that wraps it. When a user attempts to execute a distributed query that accesses a linked table, the local server must be able to log into the linked server on behalf of the user. While creating a new linked server, sp_addlinkedserver also establishes a default mapping between all logins on the local server and remote logins. This mapping assumes that SQL Server utilizes the local login's user credentials. If this doesn't suit your needs, you can use another system stored procedure, sp_addlinkedsrvlogin, to specify the new settings.

 

  EXEC sp_addlinkedsrvlogin 'NWind', false, 
  
NULL, 'Admin', NULL

 

 

      Note that SQL Server can silently use the Windows NT® user name and password to connect to a linked server and run a query on behalf of the current user. This requires that the user is connected to SQL Server via Windows NT authentication mode. In addition, the provider must support the same authentication mode. The linked server is responsible for the permissions on the various objects contained in the remote database.
      To programmatically remove the mapping between a local SQL Server login and a linked server's login, use the sp_droplinkedsrvlogin stored procedure. To remove a linked server altogether, use sp_dropserver instead.
      Once you have a fully functional linked server, you can use it whenever you would use a native SQL Server table:

 

  SELECT * FROM NWind.Northwind.dbo.Employees
  

 

 

      You should always use complete names when working with linked server objects. The fully qualified name includes the linked server name, the catalog, the schema, and the object name.

The OPENROWSET Keyword

      When it comes to issuing distributed queries a linked server is useful, but it turns out to be a predetermined mapping between a local and a remote database object. There might be circumstances in which you want to dynamically specify a query string and get a rowset from a non-SQL Server data source.
      The SQL Server 7.0 Transact-SQL language provides a new made-to-measure keyword called OPENROWSET. Functionally speaking, OPENROWSET and linked servers provide you with the same capability. You should use linked servers for data sources that you use regularly and resort to an ad hoc connector like OPENROWSET for occasional accesses to a remote OLE DB data source. OPENROWSET is a keyword defined to include all connection information necessary to access an OLE DB data source. It establishes a one-time connection and can be used as the target of a FROM clause or an INSERT or DELETE statement. All the information contained in an OPENROWSET statement can be seen and used wherever a table name is needed.
      OPENROWSET always returns a single recordset, even though a query may be designed to return multiple sets of records. In the following code snippet OPENROWSET takes the first record and ignores all the others.

 

  OPENROWSET('provider_name' 
  
{
'datasource';'user_id';'password'
| 'provider_string'
},
{
[catalog.][schema.]object
| 'query'
})

 

 

      Depending on the characteristics of the OLE DB provider you're addressing, catalog and schema names are not always necessary. They are mandatory only if the provider supports multiple catalogs and schemas in the given data source. Catalog and schema names can be omitted if the provider does not support them.
      The query string is any string that the specified provider can understand and process. If the provider doesn't target a DBMS, this string is written in a custom, provider-specific language; otherwise, it is SQL. As mentioned earlier, a unified language or a unified model for issuing heterogeneous queries would be a good next step in the march toward software standardization, following HTTP for networking, XML for data description, and SOAP for method invocation. If XML takes root as an interchange tool between clients and database servers, then Extensible Query Language (XQL), properly revised by the industry and standardized by the W3C, might be the query language of the future. Maybe there will be a sort of XSQL in the next couple of years.

The OPENQUERY Keyword

      OPENQUERY is a new Transact-SQL keyword that executes a pass-through query on a given linked server. Like OPENROWSET, OPENQUERY can be employed in a FROM clause or as the target pseudo-table for an INSERT, UPDATE, or DELETE statement. Of course, both OPENQUERY and OPENROWSET can update the database only if the provider supports that feature. While this is standard for providers directly wrapping a DBMS, some custom OLE DB providers are read-only and don't provide update features.
      The OPENQUERY syntax is straightforward:

 

  OPENQUERY(linked_server, 'query')
  

 

 

You just need to specify the name of the linked server and the text of the query to run. Like OPENROWSET, if the query returns multiple recordsets OPENQUERY only takes the first one into account.

Using OPENROWSET

      Let's look at a few practical examples of how to use the OPENROWSET keyword with Transact-SQL.

Figure 4 Issuing a Distributed Query
Figure 4 Issuing a Distributed Query

Figure 4 shows how the SQL Server Query Analyzer processes the following query:

 

  SELECT firstname,lastname,notes
  
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'd:\nwind.mdb';'admin';'', Employees)

 

 

The first argument is the name of the OLE DB provider. This must evaluate to the name used to register the provider with the system registryâ€"the progID of the COM object that implements the provider. The second argument contains the information needed to identify and connect to the data source. All the tokens are separated by a semicolon. For Microsoft Access tables, this information corresponds to the name of the MDB file, user name, and password. There's also an optional fourth token, as described by the syntax shown previously. It refers to extra information that the provider may need. The third argument contains the object to accessâ€"the physical table or the query to run. In the previous example, I've used the table name (Employees). Provided that it makes sense, you could also specify a query string:

 

  SELECT lastname
  
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'd:\nwind.mdb';'admin';'',
'select * from Employees')

 

 

Figure 5 shows the final rowset with just one column.

Figure 5 Querying Last Names Only
Figure 5 Querying Last Names Only

      You probably would not want to use SQL Server 7.0 to access a simpler Access table in a real-world application. The power of distributed queries is particularly evident when heterogeneous data sources are involved. In the following SQL script the final recordset is a combination of a SQL Server table and an Access table.

 

  USE [My Articles]
  
SELECT m.[Name], a.Title, a.Issue, m.Web
FROM Magazines AS m
INNER JOIN
OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'd:\My documents\articles.mdb';'admin';'',
Articles)
AS a
ON m.MagID = a.Magazine

 

 

      This SQL Server database is called My Articles and contains a single table called Magazines. This table provides information about computer magazinesâ€"name, publisher, URL, and frequency of publication. The Access database contains information about articles written for each magazine. While in a real-world app you normally don't use SQL Server and Access together, during development time and when there is critical legacy data you may mix data from two or more sources.
      The previous code snippet executes an inner join on the value of the magazine ID field. Figure 6 shows the result in the SQL Server Query Analyzer. Of course, the previous command is only a query string that SQL Server 7.0 runs, but you can also issue it from within other programming environments, including Visual Basic®, ASP, Visual C++®, and wherever ADO and OLE DB can be successfully used.

Figure 6 Joining SQL Server and Access Tables
Figure 6 Joining SQL Server and Access Tables

      Let's consider ADO and Visual Basic now.

 

  Dim cn As New ADODB.Connection
  
Dim rs As New ADODB.Recordset
cn.Provider="SQLOLEDB"
cn.Open "Initial Catalog=My Articles", "sa"
rs.Open Text1.Text, cn

 

 

The OLE DB provider for the ADO Connection object must be SQLOLEDBâ€"that is the provider for SQL Server 7.0. In the Connection's Open method specify the initial catalog, the user name, and the password for login. The command text is built like this:

 

  buf = ""
  
buf = buf & "SELECT m.[Name], a.Title, a.Issue "
buf = buf & "FROM Magazines AS m INNER JOIN "
buf = buf & "OPENROWSET('Microsoft.Jet.OLEDB.4.0',"
buf = buf & "'d:\articles.mdb';'admin';'', Articles) "
buf = buf & "AS a ON m.MagID = a.Magazine"
Text1.Text = buf

 

 

The results are shown in Figure 7. As you may have guessed, the Visual Basic-based program uses an ActiveX control called DBReportViewâ€"my favorite grid control, from the June 1999 issue of MSJ. You can download the source code at https://www.microsoft.com/msj/0699/oledb/oledb.htm.

Figure 7 A Visual Basic-based Query
Figure 7 A Visual Basic-based Query

      Since readers who use C++ have complained that data access demos are usually written in Visual Basic, I'll rewrite this same demo using MFC and the ATL OLE DB consumer templates. Suppose you have a dialog-based MFC application whose UI contains a listview. Figures 8 and 9 show how to add a new ATL data access object. It refers to the SQL Server OLE DB provider and points primarily to the dbo.Magazines table.

Figure 8 Connecting to SQL Server 7.0
Figure 8 Connecting to SQL Server 7.0

The wizard generates a dboMagazines.h file whose accessor class shapes a command that returns all the fields from the table. The command text is defined through the DEFINE_COMMAND macro. You need to change the structure of the accessor class as well as the macro to reflect the rowset you want to obtain. Figure 10 shows the modified code for dboMagazines.h.

Figure 9 Selecting a Table
Figure 9 Selecting a Table

      The following snippet loops through the rowset records and fills up a three column report listview:

 

  CdboMagazines arts;
  
arts.Open();
while(arts.MoveNext() == S_OK) {
m_pLV->InsertItem(i, arts.m_Name);
m_pLV->SetItemText(i, 1, arts.m_Title);
m_pLV->SetItemText(i, 2, arts.m_Issue);
i++;
}

 

 

The final result is shown in Figure 11.

Figure 11 MFC and ATL Query
Figure 11 MFC and ATL Query

Custom OLE DB Providers

      So far I've used SQL Server tables along with Access tables. However, the underlying interface is always OLE DB, which means that any OLE DB-compliant data source could be used for joins or other data combinations. In the article "Exposing Your Custom Data in a Standardized Way Through ADO and OLE DB" (MSJ, June 1999), I demonstrated how to do this by selecting all the e-mail messages received from people listed in my SQL Server Contacts database. By slightly improving the OLE DB provider I presented last month (see the Cutting Edge column in the July 2000 issue of MSDN® Magazine), you could scan your disks and retrieve all the documents authored by any of the names in a certain database. Last month, in fact, I developed an OLE DB provider to return a recordset of file names using file type and author name as additional fields for query.
      However, while developing custom OLE DB providers to be used within heterogeneous queries you should be aware of an insidious bug in one of the ATL header filesâ€"specifically atldb.h. This bug prevents custom OLE DB providers written using ATL templates from working properly when interrogated by SQL Server 7.0 within a heterogeneous query.
      The fundamental problem that causes the misbehavior has two faces (see Knowledge Base article Q198520 at https://support.microsoft.com/support/kb/articles/Q198/5/20.asp). First, The ATL IRowsetImpl::GetColumnInfo method returns incorrect information. Second, the IRowsetImpl::GetData method doesn't handle provider-owned memory the right way. Besides the memory leak problem, the bug in GetColumnInfo makes it impossible for the SQL Server query mechanism to obtain correct column information from the provider. The file that needs to be fixed is atldb.h.
      The solution comes with the atldbfix.exe file that you can download as part of the previously mentioned Knowledge Base article. Unzip the self-extracting executable and replace the existing atldb.h with a file named atldbfix.h. (All the necessary steps are explained in detail in the enclosed readme file.) This bug affects only the OLE DB providers you write through ATL and Visual Studio® 6.0.
      While I'm on the topic of bugs in the ATL classes for OLE DB, let me say a few more words about the PROVIDER_COLUMN_ENTRY macro. You use this macro when specifying the columns that form the provider-returned recordset. The ATL wizard-generated code erroneously returns a fixed-length string instead of a variable length string. As a result, SQL Server complains about an unexpected data length for the column each time the length of the actual data doesn't match the maximum size of the field. To solve the problem, use PROVIDER_COLUMN_ENTRY_STR unless you're really using fixed-length data for that column.

Conclusion

      Distributed queries are clearly useful for obtaining data from remote servers. As an added bonus, due to the underlying OLE DB interface, they can now span heterogeneous data sources and are no longer limited to relational and SQL databases. While the integration of diverse data types into the same database server is a requirement that many modern RDBMSs offer, only SQL Server 7.0 provides the ability to join heterogeneous data. Transact-SQL supports a couple of keywords (OPENROWSET and OPENQUERY) that allow you to expose as pseudo-tables any rowset obtained through a custom query language. Check the SQL Server Books Online for more information on pseudo-tables and definitely consider using this technology if you're struggling with integrating data from disparate sources within a single application.

Dino Esposito is a senior trainer and consultant based in Rome. He has written Windows Script Host Programmer's Reference (WROX, 1999). You can reach Dino at desposito@vb2themax.com.

From the August 2000 issue of MSDN Magazine.