Build Your Own Research Library with Office 2003 and the Google Web Service API

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

Summary: Chris Kunicki discusses how to build a research library, a cool new feature of Microsoft Office 2003 that allows the easy research of external resources from within Office. (16 printed pages)

Chris Kunicki, OfficeZealot.com

March 3, 2003

Download the GoogleResearchLibrarySource.exe.

Contents

  • Overview

  • Introducing the Research Library

  • Build Your Own Research Library Service

  • Notes from the Trenches

  • About the Author

Overview

In my last column, I spoke on the subject of What's New with Smart Tags in Office 2003, and to my delight I received a tremendous amount of positive and enthusiastic interest in the next version of Microsoft Office, which has been officially named Microsoft Office 2003. My advice is to stay tuned to the Office Developer Center to find out more about the next release of Office.

Personally, I have been working with the betas for many months and I have to tell you that this is going to be an impressive release of Office. It's hard to know where to start. There are new features that will appeal to users, enterprises, and ISVs. Because of all the excitement, I'm going to spend the next few months focusing on the new features and more importantly, the types of solutions enabled with Microsoft Office 2003.

Note

The information in this article is based on Office 2003. As always, information can change. I have made every effort to focus on things that I think will hold true when the product ships.

Introducing the Research Library

This month I am going to share with you what I consider one of the hidden gems of Microsoft Office 2003 — the research library. Because there are so many other good things to talk about in regards to Office 2003, there has been little fanfare for the research library. The research library is available in the next version of Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Microsoft Outlook, Microsoft Publisher, Microsoft OneNote, and Internet Explorer. The research library is accessible as a task pane in each of these applications.

What is the research library? As you may have concluded from its name, it is a built-in research tool that allows users to search various information sources from within Office. Some of these research sources are built into Office and others are external services, therefore requiring a connection to the Internet to use them. Figure 1 shows the built-in English Thesaurus research service in action.

Figure 1. The Research task pane in Microsoft Word 2003

Dissecting the Research task pane as shown in Figure 1, we see the following elements:

  1. For most of the supported applications, you can access the research library by clicking the Research button on the Standard toolbar. Additionally, you can access the Research task pane by opening the task pane (View | Task Pane) and then selecting Research from the list of available task panes.

  2. After the research task pane is open, type in the search text under the Search for text box. Optionally, a user can ALT+click text in a document. By ALT+clicking text, the Office application opens the task pane and places the text into the Search for text box.

  3. Next, select a research service from the list of services. Further along in this article I'll enumerate what some of these services are. Finally, click the Go button, which is the green button with the arrow to fire off the search.

  4. The Research task pane offers a Back and Forward button similar to a Web browser for reviewing previous searches.

  5. The Research task pane displays the results of the search. Multiple services can be used in a single search and the results separated with a header by research service.

  6. The Research task pane supports many control types. In Figure 1, you see a drop-down menu that allows the selected text to be inserted or copied into the current document.

  7. At the bottom of the Research task pane, you'll notice the Research options... link. This opens the Research Options dialog box as shown in Figure 2.

Figure 2. The Research Options dialog box used configure research services used by Office

The Research Options dialog box is used to add new research services, update existing services, modify Parental controls, and review the properties of installed services. As you can see in Figure 2, Office 2003 ships with a number of research services. The following list summarizes some of these services:

  • Local sources on the machine

    • Thesaurus and Translation
  • Office 2003 research services

    • Dictionary

    • Encyclopedia

    • Web Search

    • Stock Quotes

  • Third-party research services

    • Factiva

    • eLibrary

    • Gale

    • WorldLingo

    • and more to come

Most of these services require a live connection to the Internet. If the user is not connected, they will not be able to use some of the research services. The research services that require a live connection provide for current and accurate information from many useful resources from around the Web. One of my personal favorites is the MSN Money Stock Quotes research service. Figure 3 shows the results of searching on the stock ticker MSFT.

Figure 3. Example of using the MSN Money Stock Quotes research service

When you compare the results of Figure 1 and Figure 3, showing two different research services, you begin to get the idea about the flexibility in displaying various types of data. The MSN Money Stock Quotes shows a formatted table, hyperlinks that launch off to the MSN Web site for charting, and a button labeled Insert Stock Price. The Insert Stock Price button is actually a smart tag. Research task panes can optionally use the actions of installed smart tags to increase the interactivity between the Research task pane and the Office application. In this case, the MSN Money Stock Quotes can insert the current stock price or a formatted table with current stock statistics.

Build Your Own Research Library Service

Now for the best part — the Research task pane is extensible. Microsoft will be shipping an Office Research Software Developers Kit with the final product release. This allows developers to integrate 3rd party products and internal systems information, and also allows for ISVs to integrate their product line into the environment of Microsoft Office.

For the rest of the article, I'll discuss the basics of building a research library service. The sample is based on the Google Web APIs, which is a set of XML Web services that allow a software developer to query more than 3 billion Web documents directly from their own application. In our case, we're going to query Google from the Research task pane. For more information on the details of using the Google Web APIs, visit http://code.google.com/.

What is the scenario for the Google research library service? I often find myself launching Internet Explorer and going to Google to search their site for information that is in a document or e-mail I am working on. Now I can do the search right from within Office using the code sample we are going to discuss. I can ALT+click a word or words in an e-mail. The Research task pane opens with my search text already in the Search for text box. I then select the Google Web Search service and hit go. Next, the Research task pane displays the results of my search. I can then click one of the results and the Web site is loaded for me in the browser. See Figure 4 for a snapshot of the Google Web Search research library.

Figure 4. Snapshot of the Google Web Search research library

The download accompanying this article contains the Google Web Search research library service, tested under Office 2003 Beta 2. There are two versions, one written in C# and another in Visual Basic .NET. Beyond the syntax used, they are identical. An interesting side note, I first wrote this in C# and converted it to Visual Basic .NET and it took me about 10 minutes to do the conversion. For me, this proved it is not difficult to move between the languages. For the sake of space, I will primarily use the C# example for code references in this article.

Finally, to access the Google Web APIs service, you must create a Google account and obtain a FREE license key. Your Google Account and license key entitle you to 1,000 automated queries per day. You can request this key at http://code.google.com/. This process only takes a few minutes. Then, depending on which language version of the source code you are going to use, open the web.config and change the GoogleKey custom property. The following shows the directories and language versions:

<<Unzip Directory>>\GoogleResearchLibraryCSharp  - C# Version 
<<Unzip Directory>>\GoogleResearchLibraryVBNet – VB.NET Version

Put your license key assigned to you by Google into the value attribute:

<add key="GoogleKey" value="[INSERT YOUR KEY HERE - SEE README]"/>

As an example, the modified XML node should look like:

<add key="GoogleKey" value="abc123ThisIsAMadeUpKeyxyz/abc"/>

See the included README.DOC file for additional installation steps.

The Basics of the Research Library Architecture

The research library primarily consists of an XML Web service with two Web methods. This is a departure from many other Office extensibility models. In the past, extending Office often required installing bits on the client. The research library service is a standard XML Web service and requires no client-side code, unless you plan to use a smart tag (which is completely optional, many research libraries will not need smart tags). Using Visual Studio .NET or another XML Web service enabled platform, you create a set of Web services that run on a server and communicate with Office through predefined interfaces. Figure 5, taken from the Office Research SDK, illustrates the communication flow between Office and the research library XML Web service.

Figure 5. Illustration of the communication flow between Office and the research library XML Web service

  1. A user selects the Add Service button from the Research Options dialog box (Figure 2). The user provides the link to the Registration page of the research library service. The registration page has a Web method named Registration().

  2. Registration() returns information about the provider of the research service, the type of service, and the service category and pointers for the Query() Web method used during searching. Now the service is ready to be used.

Note You may be thinking this is too much work for your users; they may not be inclined to go through the problem of adding a service. With this in mind, there are other options: (1) you can deploy a setup file that installs the appropriate registry entries for the research library or (2) Office includes a discovery mechanism for adding and updating research library services. When an organization deploys Office 2003, they can include a pointer in the setup to a discovery XML Web services. This allows new services to be added without requiring user intervention at the desktop.

  1. User requests from the research library are sent to the Query() Web method of the research library service being accessed.

  2. The Query() method parses the request and formulates a response. The response is sent back to Office, which then displays the results.

The communication between Office and the research XML Web service must conform to a set of XML schemas defined in the Office Research SDK. For example, the registration response packet based on the response schema may look like the following:

<?xml version="1.0" encoding="utf-8"?>
<ProviderUpdate xmlns="urn:Microsoft.Search.Registration.Response">
    <Status>SUCCESS</Status>
    <Providers>
    <Provider>
        <Message>This is a sample research library </Message>
        <Id>{9FF837AF-34D6-4a94-BB52-B0F19F3A343A}</Id>
        <Name>OfficeZealot.com</Name>
        <QueryPath>
        http://localhost/googleresearchlibrarycsharp/Query.asmx
        </QueryPath>
        <RegistrationPath>
        http://localhost/googleresearchlibrarycsharp/registration.asmx<
        </RegistrationPath>
        <AboutPath/>
        <Type>SOAP</Type>
        <Services>
            <Service>
            <Id>{CD144577-9D90-4144-AE38-0D6553CA4004}</Id>
            <Name>Google Web Search (CSharp)</Name>
            <Description>Google Web Service research library</Description>
            <Copyright>All content Copyright OfficeZealot.com (c) 
            2003.</Copyright>
            <Display>On</Display>
            <Category>RESEARCH_GENERAL</Category>
            </Service>
        </Services>
    </Provider>
    </Providers>
</ProviderUpdate> 

This information is communicated to the user as they install the research library service. It also contains the information Office needs to know how to communicate with the research library service on the remote server.

Diving into the XML Web Service

As mentioned, a research library primarily consists of an XML Web service with two Web methods. Both of these Web methods can coexist in one Web service file. However, I prefer to logically and physically separate them to simplify the code. Therefore, the Google Web Service research service has two key files:

  1. The Registration.asmx Web service file for the Registration() method

  2. TheQuery.asmx Web service file for Query() Web method

Office expects these Web methods to respond according to the research library schemas and their associated namespace. Therefore, each XML Web service must be defined with the appropriate namespace for a response, which is urn:Microsoft.Search. In C#, you use the following attribute for the Web service definition:

[WebService(Namespace="urn:Microsoft.Search")]

For Visual Basic .NET:

<WebService(Namespace:="urn:Microsoft.Search")>

This tells the .NET Framework to enclose the SOAP message bodies in the urn:Microsoft.Search namespace.

The following is the C# code for the Registration() Web method stored in Registration.asmx.cs

[WebMethod(CacheDuration=86400)] //Cache this for a full day, this rarely 
changes (86400 seconds = 24 Hours)
public string Registration(string registrationxml)
{
    string physicalPath = 
        HttpContext.Current.Server.MapPath(".").ToString();
    string httpPath = 
        ConfigurationSettings.AppSettings["ServerPath"]  
        + HttpContext.Current.Request.ApplicationPath + "/";

    XmlDocument registrationResponse = new XmlDocument();
    registrationResponse.Load(physicalPath + 
    "\\RegistrationResponse.xml");
           
    XmlNamespaceManager nsm = 
        new XmlNamespaceManager(registrationResponse.NameTable);
    nsm.AddNamespace("ns", "urn:Microsoft.Search.Registration.Response");

    registrationResponse.SelectSingleNode("//ns:QueryPath", nsm).InnerText = 
        httpPath + "Query.asmx";
    registrationResponse.SelectSingleNode("//ns:RegistrationPath", 
        nsm).InnerText =  httpPath + "Registration.asmx";
    registrationResponse.SelectSingleNode("//ns:AboutPath", nsm).InnerText =
        httpPath + "about.asmx";

    return registrationResponse.InnerXml.ToString();
}

The registration response is rather simple, as it only defines information about the research library service. The code loads a local XML file named RegistrationResponse.xml that functions as an overall template for the response and inserts additional information determined at runtime. This is returned to Office.

The following is the code for the Query() Web method and a one of its dependent private functions:

[WebMethod(CacheDuration=3600)]  //Cache for 1 hour
public string Query(string queryXml)
{
    //Verify that a query value has been specified
    if (queryXml.Length == 0)
        return "";

    string queryString;
    string applicationName;
    int startAt = 1;

    XmlDocument requestXml = new XmlDocument();
    try
    {
        requestXml.LoadXml(queryXml.ToString());

        XmlNamespaceManager nsmRequest = 
            new XmlNamespaceManager(requestXml.NameTable);
        nsmRequest.AddNamespace("ns", "urn:Microsoft.Search.Query");
        nsmRequest.AddNamespace("oc", 
           "urn:Microsoft.Search.Query.Office.Context");

        queryString = requestXml.SelectSingleNode("//ns:QueryText", 
             nsmRequest).InnerText;
        applicationName = requestXml.SelectSingleNode("//oc:Name", 
            nsmRequest).InnerText;
        try
        {
            startAt = Convert.ToInt32(requestXml.SelectSingleNode("//ns:StartAt", 
                nsmRequest).InnerText.ToString());
        }
        catch
        {
            startAt = 1;
        }
    }
    catch
    {
        //Parsing queryXML has failed, use the input string for the google search
        queryString = queryXml;
    }
                
    XmlDocument responseWrapper = new XmlDocument();
    try
    {
        responseWrapper.Load(physicalPath + "\\ResponseWrapper.xml");
    }
    catch
    {
        //Cannot parse the response body wrapper, return nothing
        return "";
    }
    XmlNamespaceManager nsmResponse = 
        new XmlNamespaceManager(responseWrapper.NameTable);
    nsmResponse.AddNamespace("ns", "urn:Microsoft.Search.Response");

    responseWrapper.SelectSingleNode("//ns:Range",nsmResponse).InnerXml = 
        QueryGoogle(queryString, startAt);
    return responseWrapper.InnerXml.ToString();
}

private string QueryGoogle(string queryString, int startAt)
{
    StringWriter queryResponse = new StringWriter();
    XmlTextWriter writer = new XmlTextWriter(queryResponse);

    GoogleSearchService search = new GoogleSearchService();
    try 
    {
        // Invoke the search method
        GoogleSearchResult results = 
            search.doGoogleSearch(ConfigurationSettings.AppSettings["GoogleKey"] , 
            queryString, startAt, PageCount, true, "", true, "", "", "");

        //Elements for Previous | Next links
        writer.WriteElementString("StartAt",startAt.ToString());
        writer.WriteElementString("Count",PageCount.ToString());
        writer.WriteElementString("TotalAvailable", 
            results.estimatedTotalResultsCount.ToString());

        //Start Results Element
        writer.WriteStartElement("Results");
        //Start Content element
        writer.WriteStartElement("Content", 
            "urn:Microsoft.Search.Response.Content");

        //Insert Image
        writer.WriteStartElement("Image");
        writer.WriteAttributeString("source", httpPath + "/gLogo.gif");
        writer.WriteEndElement();

        foreach(ResultElement result in results.resultElements)
        {
            writer.WriteStartElement("Heading");
            writer.WriteAttributeString("collapsible","true");
            
            if (result.title.Length> 0)
                writer.WriteElementString("Text", 
                    StripHtml(result.title).Trim());
            else
                writer.WriteElementString("Text", "No Title");

            if (result.snippet.Length > 0)
                writer.WriteElementString("P",
                    StripHtml(result.snippet));

            if (result.summary.Length>0)
            {
                writer.WriteStartElement("P");
                    writer.WriteStartElement("Char");
                    writer.WriteAttributeString("light", "true");
                    writer.WriteString("Description: ");
                    writer.WriteEndElement();
                    writer.WriteString(StripHtml(result.summary));
                writer.WriteEndElement();
            }

            writer.WriteStartElement("Hyperlink");
            writer.WriteAttributeString("url", result.URL);
            writer.WriteElementString("Text", result.URL);
            writer.WriteEndElement();

            writer.WriteEndElement(); //Heading
        }

        writer.WriteElementString("HorizontalRule","");
        writer.WriteStartElement("Hyperlink");
            writer.WriteAttributeString("url", 
                "http://www.google.com/search?q=" + queryString);
            writer.WriteElementString("Text", 
                "Continue your search at Google.com ...");
        writer.WriteEndElement();
        writer.WriteStartElement("Hyperlink");
            writer.WriteAttributeString("url", 
                "http://groups.google.com/groups?q=" + queryString);
            writer.WriteElementString("Text", 
                "Search at groups.Google.com ...");
        writer.WriteEndElement();
        //Overview of search results
        writer.WriteElementString("P", "Search took " + 
            Math.Round(results.searchTime,2).ToString() + " seconds.");
        writer.WriteElementString("HorizontalRule","");
        writer.WriteStartElement("Image"); //Insert OZ logo Image
        writer.WriteAttributeString("source", httpPath + "/oz15owide.gif");
        writer.WriteEndElement();

        writer.WriteEndElement(); //Close Content element
        writer.WriteEndElement(); //Results Content element

        writer.Close();

        return queryResponse.ToString();
    }
    catch (System.Web.Services.Protocols.SoapException ex) 
    {
        return "";
    }
}

When a user makes a request for information through the Research task pane, the request is packaged up according to the schema for a query request. This information is then passed to the Query() Web method and is available for parsing in the queryXml parameter. Query() extracts the search term from queryXml and then forwards the search term to the QueryGoogle() function.

QueryGoogle() actually does all the heavy lifting for this solution. It calls the Google Web Service API using the search term as its input parameters. The response from Google is then parsed and repackaged according to the research library response Schema. The following XML packet is an example of the response that is show in Figure 4:

<?xml version="1.0" encoding="utf-8"?>
<ResponsePacket revision="1" xmlns="urn:Microsoft.Search.Response">
<Response domain="{CD144577-9D90-4144-AE38-0D6553CA4004}">
    <Range>
        <StartAt>1</StartAt>
        <Count>2</Count>
        <TotalAvailable>738</TotalAvailable>
        <Results>
        <Content xmlns="urn:Microsoft.Search.Response.Content">
            <Image source="http://localhost/GoogleResearchLibraryCSharp//gLogo.gif"/>
            <Heading collapsible="true">
            <Text>Smart Solutions Opinion</Text>
            <P>Office as Swiss Army Knife. Chris Kunicki.. </P>
            <Hyperlink url="http://www.msofficemag.net/opinion/default.asp?sort=W&amp;Ord=A">
                <Text>http://www.msofficemag.net/opinion/default.asp?sort=W&amp;Ord=A</Text>
            </Hyperlink>
            </Heading>
            <Heading collapsible="true">
            <Text>What's New with Smart Tags in Office 11</Text>
            <P>Rate this page: 4 users, 3.8 out of 5. Read User ... </P>
            <Hyperlink url="http://msdn.microsoft.com/columns/office.asp">
                <Text>http://msdn.microsoft.com/columns/office.asp</Text>
            </Hyperlink>
            </Heading>
            <HorizontalRule/>
            <Hyperlink url="http://www.google.com/search?q=Chris Kunicki">
            <Text>Continue your search at Google.com ...</Text>
            </Hyperlink>
            <Hyperlink url="http://groups.google.com/groups?q=Chris Kunicki">
            <Text>Search at groups.Google.com ...</Text>
            </Hyperlink>
            <P>Search took 0.19 seconds.</P>
            <HorizontalRule/>
            <Image source="http://localhost/GoogleResearchLibraryCSharp//oz15owide.gif"/>
        </Content>
        </Results>
    </Range>
    <Status>SUCCESS</Status>
    </Response>
</ResponsePacket>

Query.asmx makes heavy use of the XMLTextWriter object. This object provides a highly efficient and lightweight way of constructing a well-formed XML document. Since the response from the Query interface is based on schema structured XML, it is important to have a well-formed XML document, otherwise the Research task pane will likely show an empty pane. The following example illustrates the use of the XMLTextWriter object:

writer.WriteStartElement("P");
writer.WriteStartElement("Char");
writer.WriteAttributeString("light", "true");
writer.WriteString("Description: ");
writer.WriteEndElement();
writer.WriteString(StripHtml(result.summary));
writer.WriteEndElement();

Creates the following XML:

<P><Char light="True">Description: information from google</Char></P>

At first glance, this may appear to be a lot of code to create a simple XML string. However, a typical response packet can be 50-100 lines of XML. This gets to be rather cumbersome to manage with generic string handling routines. The XMLTextWriter object provides not only a memory efficient way of constructing this XML response packet, but it's also easier to create and manage through code.

Taking Your Research Service Worldwide

If you want your research service to be made available to millions of Microsoft Office customers, you can apply for a listing on the Microsoft Office Marketplace. Your research service can be listed on a Web page that is accessible from a direct link in the Research Task Pane. This Web page is only accessible to customers coming from the Research Task Pane as shown in Figure 6, so the referrals are highly targeted.

Figure 6. Research Task Pane

Visit the Microsoft Office Marketplace to learn more about this opportunity.

Notes from the Trenches

As you can imagine, we have only scratched the surface of what is possible. As an early adopter of this new technology, I thought I would take the last few paragraphs to share a handful of observations about the experience of building a research task pane.

The experience is similar to learning HTML, as the research library is a response through a Web service formatted in an XML structure defined by Microsoft. It took me about two days of looking through their research library schemas to gain a comfort level with the syntax. Even though the Research Library task pane is similar to an embedded browser, it does not have the flexibility of a browser. You cannot set the background color, you have limited control over the layout of text and images, and there is no client-side script. Even so, once you know what the syntax allows, you'll find there are many things you can do.

Another interesting point is that you have to learn to deal with limited screen real estate. You are dealing with an area that is typically 100-200 pixels wide. Therefore, you have to learn to economize in how you present information. In some ways, you can compare this to formatting information for a PDA as they are a similar form factor.

One benefit of using the .NET Framework for XML Web services is the impressive built-in caching technology. By adding the following attribute to the Response() Web method, we gain caching of the registration response, which does not change frequently:

[WebMethod(CacheDuration=86400)] //Cache this for a full day, this rarely 
changes (86400 seconds = 24 Hours)
public string Registration(string registrationxml)

I also applied this to the Query() method and set the cache time to 1 hour, as shown in the code below:

[WebMethod(CacheDuration=3600)]  //Cache for 1 hour
public string Query(string queryXml)

In addition, how do you debug a research library? Again, it's good to compare this to debugging an HTML browser application; however, in this case the Office client is the browser. Set your breakpoints in the Web methods you want to debug and then run the project with debugging turned on. Visual Studio .NET will launch an instance of Internet Explorer and place the code into debug mode. Keep the Internet Explorer window open and switch to the Office application you are testing. As you start running search request through the research library, Visual Studio .NET will pause at the breakpoints.

Finally, a few random notes:

  • The research library supports collecting a richer set of information through a forms syntax.

  • Secure connections through https:// supported.

  • The research library supports Windows authentication.

  • Cookie support:

    • Can write cookies that are recognized by the browser.

    • Shares persistent cookies with Internet Explorer.

About the Author

Chris Kunicki works with customers, architects, and engineers to build cool desktop, enterprise and Web applications at OfficeZealot.com. Chris is a long time enthusiast of Office development and continues evangelizing Office as an important platform for building solutions by writing and speaking to users and developers. You can reach him at chris@officezealot.com. Check out his slant on things at http://www.officezealot.com.