This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
SQL Server 7.0 and OLE DB Heterogeneous Queries | |||||||||||||||||||||||||||||||||
Dino Esposito | |||||||||||||||||||||||||||||||||
Code for this article: Cutting0800.exe (47KB) | |||||||||||||||||||||||||||||||||
niversal Data Access (UDA) is Microsoft's solution for integrating different data sources into applications. OLE DB provides an API that allows you to expose heterogeneous data sources through a common set of software toolsâ€"specifically COM interfaces. While it is relatively easy to devise a common API to expose proprietary data in a standardized way, creating a general-purpose, universally valid mechanism for querying data is not as easy. In fact, it probably can't be done. Heterogeneous Queries A heterogeneous query (also known as a distributed query) involves more than one database server. For example, you can run a query that returns a joined recordset where some records come from a SQL Serverâ„¢ database and others come from an AS/400 machine. Since neither a universal language nor a universal query processor exist today (and probably won't exist in the near future), any heterogeneous query must pass through a module that can understand the overall syntax, identify and run all the subqueries, and put the various pieces together. In short, running a distributed query requires a specialized app or module that behaves like a distributed query coordinator, as opposed to the Distributed Transaction Coordinator (DTC) found in the Microsoft Transaction Services (MTS) architecture and the Windows® 2000 Component Services. Linked Servers Let's see how the SQL Server 7.0 mechanism for distributed queries actually works, and how to take advantage of these queries in real-world applications. Logical elements called linked servers are central to distributed queries. Figure 1 shows the node in the SQL Server Enterprise Manager where all the existing linked servers are listed.
A linked server is an element in the SQL Server environment that gathers all the information needed to execute OLE DB commands. The OLE DB providers involved in this operation can reside on different servers. Linked servers offer several advantages. For one thing, they provide remote server access plus the ability to run queries and execute commands on heterogeneous data sources across the enterprise information system. In addition, linked servers provide a uniform interface to work with different data sources (see Figure 2).
When configuring a linked server, you just specify an OLE DB provider and an OLE DB data source. There are two ways to do this: manually through the SQL Server Enterprise Manager or programmatically via a system-provided stored procedure. To create a linked server interactively from the dialog shown in Figure 1, right-click the Linked Servers node and then select the New Linked Server command. Figure 3 shows the New Linked Server dialog.
Another way of creating linked server configurations is through the Query Analyzer or any other environment that allows you to run a couple of stored procedures. (An ADO application, for example, is fine as well.) The following code creates a linked server to a Microsoft Access database:
The stored procedure can accept up to seven arguments, as shown here:
The first argument is the name of the new server, and all the others concur to unambiguously identify the data source and the OLE DB provider that wraps it. When a user attempts to execute a distributed query that accesses a linked table, the local server must be able to log into the linked server on behalf of the user. While creating a new linked server, sp_addlinkedserver also establishes a default mapping between all logins on the local server and remote logins. This mapping assumes that SQL Server utilizes the local login's user credentials. If this doesn't suit your needs, you can use another system stored procedure, sp_addlinkedsrvlogin, to specify the new settings.
Note that SQL Server can silently use the Windows NT® user name and password to connect to a linked server and run a query on behalf of the current user. This requires that the user is connected to SQL Server via Windows NT authentication mode. In addition, the provider must support the same authentication mode. The linked server is responsible for the permissions on the various objects contained in the remote database.
You should always use complete names when working with linked server objects. The fully qualified name includes the linked server name, the catalog, the schema, and the object name. The OPENROWSET Keyword When it comes to issuing distributed queries a linked server is useful, but it turns out to be a predetermined mapping between a local and a remote database object. There might be circumstances in which you want to dynamically specify a query string and get a rowset from a non-SQL Server data source.
Depending on the characteristics of the OLE DB provider you're addressing, catalog and schema names are not always necessary. They are mandatory only if the provider supports multiple catalogs and schemas in the given data source. Catalog and schema names can be omitted if the provider does not support them. The OPENQUERY Keyword OPENQUERY is a new Transact-SQL keyword that executes a pass-through query on a given linked server. Like OPENROWSET, OPENQUERY can be employed in a FROM clause or as the target pseudo-table for an INSERT, UPDATE, or DELETE statement. Of course, both OPENQUERY and OPENROWSET can update the database only if the provider supports that feature. While this is standard for providers directly wrapping a DBMS, some custom OLE DB providers are read-only and don't provide update features.
You just need to specify the name of the linked server and the text of the query to run. Like OPENROWSET, if the query returns multiple recordsets OPENQUERY only takes the first one into account. Using OPENROWSETLet's look at a few practical examples of how to use the OPENROWSET keyword with Transact-SQL.
Figure 4 shows how the SQL Server Query Analyzer processes the following query:
The first argument is the name of the OLE DB provider. This must evaluate to the name used to register the provider with the system registryâ€"the progID of the COM object that implements the provider. The second argument contains the information needed to identify and connect to the data source. All the tokens are separated by a semicolon. For Microsoft Access tables, this information corresponds to the name of the MDB file, user name, and password. There's also an optional fourth token, as described by the syntax shown previously. It refers to extra information that the provider may need. The third argument contains the object to accessâ€"the physical table or the query to run. In the previous example, I've used the table name (Employees). Provided that it makes sense, you could also specify a query string:
Figure 5 shows the final rowset with just one column.
You probably would not want to use SQL Server 7.0 to access a simpler Access table in a real-world application. The power of distributed queries is particularly evident when heterogeneous data sources are involved. In the following SQL script the final recordset is a combination of a SQL Server table and an Access table.
This SQL Server database is called My Articles and contains a single table called Magazines. This table provides information about computer magazinesâ€"name, publisher, URL, and frequency of publication. The Access database contains information about articles written for each magazine. While in a real-world app you normally don't use SQL Server and Access together, during development time and when there is critical legacy data you may mix data from two or more sources.
Let's consider ADO and Visual Basic now.
The OLE DB provider for the ADO Connection object must be SQLOLEDBâ€"that is the provider for SQL Server 7.0. In the Connection's Open method specify the initial catalog, the user name, and the password for login. The command text is built like this:
The results are shown in Figure 7. As you may have guessed, the Visual Basic-based program uses an ActiveX control called DBReportViewâ€"my favorite grid control, from the June 1999 issue of MSJ. You can download the source code at https://www.microsoft.com/msj/0699/oledb/oledb.htm.
Since readers who use C++ have complained that data access demos are usually written in Visual Basic, I'll rewrite this same demo using MFC and the ATL OLE DB consumer templates. Suppose you have a dialog-based MFC application whose UI contains a listview. Figures 8 and 9 show how to add a new ATL data access object. It refers to the SQL Server OLE DB provider and points primarily to the dbo.Magazines table.
The wizard generates a dboMagazines.h file whose accessor class shapes a command that returns all the fields from the table. The command text is defined through the DEFINE_COMMAND macro. You need to change the structure of the accessor class as well as the macro to reflect the rowset you want to obtain. Figure 10 shows the modified code for dboMagazines.h.
The following snippet loops through the rowset records and fills up a three column report listview:
The final result is shown in Figure 11.
Custom OLE DB Providers So far I've used SQL Server tables along with Access tables. However, the underlying interface is always OLE DB, which means that any OLE DB-compliant data source could be used for joins or other data combinations. In the article "Exposing Your Custom Data in a Standardized Way Through ADO and OLE DB" (MSJ, June 1999), I demonstrated how to do this by selecting all the e-mail messages received from people listed in my SQL Server Contacts database. By slightly improving the OLE DB provider I presented last month (see the Cutting Edge column in the July 2000 issue of MSDN® Magazine), you could scan your disks and retrieve all the documents authored by any of the names in a certain database. Last month, in fact, I developed an OLE DB provider to return a recordset of file names using file type and author name as additional fields for query. Conclusion Distributed queries are clearly useful for obtaining data from remote servers. As an added bonus, due to the underlying OLE DB interface, they can now span heterogeneous data sources and are no longer limited to relational and SQL databases. While the integration of diverse data types into the same database server is a requirement that many modern RDBMSs offer, only SQL Server 7.0 provides the ability to join heterogeneous data. Transact-SQL supports a couple of keywords (OPENROWSET and OPENQUERY) that allow you to expose as pseudo-tables any rowset obtained through a custom query language. Check the SQL Server Books Online for more information on pseudo-tables and definitely consider using this technology if you're struggling with integrating data from disparate sources within a single application. |
|||||||||||||||||||||||||||||||||
Dino Esposito is a senior trainer and consultant based in Rome. He has written Windows Script Host Programmer's Reference (WROX, 1999). You can reach Dino at desposito@vb2themax.com. |
|||||||||||||||||||||||||||||||||
From the August 2000 issue of MSDN Magazine.