A Primer to the Office XP Primary Interop Assemblies
October 3, 2002
Summary: Learn how to use the Office XP Primary Interop Assemblies to extend and automate Office XP features from Microsoft Visual Studio .NET code projects. (15 printed pages)
In September 2002, Microsoft® released the Office XP Primary Interop Assemblies. These primary interop assemblies (PIAs) enable the solution developer to leverage all the new capabilities of the Microsoft .NET Framework and to access Microsoft Office in a reliable and consistent way. In this article, I will discuss the importance of the Office XP PIAs, show you the basics of how to start using the Office XP PIAs in your solutions, review a few key .NET terms along the way, and finally walk through a couple samples that illustrate a .NET application automating Microsoft Office.
This article is far from an exhaustive analysis of using the Office XP PIAs; however, you probably don't need one. I will provide a number of useful references within this article that you can explore. For the most part, if you are comfortable with the .NET Framework, which is well covered here on MSDN, you will find the Office XP PIAs to be relatively straightforward to use. For me personally, this has been my experience. I have found that in a matter of just a few minutes after installing the Office XP PIAs and reading the included Readme file, I was automating Office from within Microsoft Visual Studio® .NET (which is very cool!).
As a brief reminder to the reader, I do not work for Microsoft. Therefore, even though I work closely with our friends in Redmond, like you my perspective is looking from the outside in. I hope that as you read this you will not quickly dismiss some of my observations as marketing rhetoric. My opinion is that .NET is important to the future of desktop productivity, and this flies in the face of some industry pundits who question or criticize whether there is a future for the desktop.
For this reason, it is important that developers who currently or historically have leveraged Office for solutions explore .NET. Microsoft .NET addresses a number of shortcomings experienced in Office development. For example, some of these issues have revolved around little innovation for server or Internet deployment of Office solutions, the aptly named "DLL hell" problem, challenges in code reusability, and limitations of Visual Basic® for Applications (VBA).
At the same time, today's pure Web browser-based interfaces can be limited for various types of solutions and provide little flexibility for the user. Microsoft Office has become the user tool of choice for creating content and analyzing information. Because Office falls into the average user's comfort zone, Office will contribute to a superior experience for users and a tangible return on investment for the next generation of .NET applications that leverage Internet standards-based distributed computing and the raw horsepower and intelligence of the desktop computer.
Why Office and .NET?
Why Office and .NET? Let's rethink that for a second. How could Microsoft not incorporate .NET into Office?
Microsoft has made a huge investment on .NET declaring it to be the successor to its prior development models. Office is a cornerstone of the Microsoft vision of computing. For Office to go forward, it must build on the new foundation of .NET. For most of us this is welcome progress; for a small minority, they seem to cling to the old ways and are uncertain about this change. It seems that even those that have been lukewarm to .NET are coming around. If years of consulting and specializing in Microsoft technologies has taught me anything, it is that when Microsoft sets its mind to succeed with something as significant as .NET, the industry will eventually fully embrace the technology.
So, what does this have to do with Microsoft Office? Office is literally used by millions of people everyday and has often been the key product that Microsoft uses to demonstrate some of its most progressive thinking in desktop innovation and productivity. Because of this track record, many of us have been expecting (hoping, wishing, begging) that Microsoft would bring Office along for the .NET ride. Over the last year, the Office gurus here at MSDN have done an excellent job showing how Office and .NET can work together (see Paul Cornell's last OfficeTalk column as a proof point). The Office XP PIAs, however, represent a much more significant strategic statement from Microsoft.
A Brief History Lesson
From a historical perspective, Microsoft has often used an incremental approach to bringing new technologies to existing products. Do you recall in which product VBA was first released? You might think that all the Office products would have incorporated VBA at the same time, but they didn't. Microsoft Excel 5.0 was the first Office product to support VBA. Over the next few releases of Office, Microsoft expanded support for VBA until all the products supported VBA.
Why this incremental approach? Mainly because it takes a significant effort to integrate new technology without breaking backward compatibility and to perform exhaustive testing. So, instead of waiting for all the products to have VBA integration, which would have taken many years, Microsoft released it in Microsoft Excel 5.0 with a promise that support would expand into other products. Within in a few years, Microsoft was able to fully deliver on this plan. This approach encouraged early adoption of VBA by developers and ISVs. In time, it became the standard way to make a desktop product extensible. For example, Microsoft Visio® added support for VBA (well before Microsoft purchased Visio) and so did Autodesk into their AutoCAD product.
So what does this tell us? We see the same incremental approach with Office and .NET. First, MSDN authors started to cover the benefits of teaming Office with .NET. Now, Microsoft has released the official primary interop assemblies for Office XP. The dominoes have started to drop. By releasing the PIAs, Microsoft is making the strategic statement and is providing tangible evidence to the public that Office has a future in .NET land. We, therefore, can expect PIAs for future versions of Office and we can anticipate future enhancements to the Office and .NET development story. Keep your eye on this column, because over the next few months, Paul Cornell and I will start to discuss some of the innovations for the next version of Office.
Well, that is enough pondering for now on the future; let's focus on the present and get into the Office XP PIAs.
Introducing the Office XP PIAs
So what are the Office XP PIAs? To answer this question, it is important to have a few basic .NET concepts clearly in mind. Let's quickly review some of these concepts.
First, we need to understand the term interop assembly. An interop assembly is a specialized .NET assembly that contains metadata that defines COM types that enable .NET compilers to resolve calls to COM objects. There is an important distinction to remember—an interop assembly only contains definitions of COM types, as it has not modified the original COM library (the actual binary representation). Let me simplify that definition—an interop assembly contains the needed information that allows .NET managed code that runs under the common language runtime to call unmanaged code in Microsoft Office using the various Microsoft Office COM-based object model APIs.
How do you create an interop assembly? COM components such as Microsoft Office COM-based object model APIs expose type information using type libraries, but .NET compilers do not understand COM type libraries. They only understand .NET metadata. Therefore, the key to making a COM component available to the world of .NET is a mechanism that takes a COM type library and produces equivalent metadata. The common language runtime execution engine contains this functionality, which is called the type library importer. There are two methods to using the .NET type library importer:
- The TLBIMP.EXE utility included with the .NET SDK.
- When you use the Add Reference dialog box to reference a COM component from Visual Studio .NET.
Regardless of which method you use, they each produce a new assembly with a .DLL extension. This assembly is used by your .NET application to automate Office and not the COM type library.
How does an interop assembly differ from that of a primary interop assembly? For the most part, they are almost identical with the exception of a few noteworthy differences. A primary interop assembly is the official interop assembly released by a software vendor. In the case of the Microsoft Office XP PIAs, Microsoft is the owner of the Office COM-based object model and they have released the official primary interop assembly. The following two sections are quoted from an article available on MSDN entitled Primary Interop Assemblies, which nicely clarifies the purpose of a PIA:
What is a PIA?
Like any other managed assembly, an interop assembly is a collection of types that are deployed, versioned, and configured as a single unit. However, unlike other managed assemblies, an interop assembly contains type definitions (not implementation) of types that have already been defined in COM. These type definitions allow managed applications to bind to the COM types at compile time and provide information to the common language runtime about how the types should be marshaled at run time.
While any number of interop assemblies may exist that describe a given COM type, only one interop assembly is labeled the PIA. The PIA contains the official description of the types as defined by the publisher of those types. The PIA may contain certain customizations that make the types easier to use from managed code. The PIA is always signed by the publisher of the original COM type.
Any interop assembly that is not provided by the publisher of the COM types is considered unofficial and should be avoided. Because the types defined in such an assembly are not to be signed by the publisher of the PIA, they are incompatible with the definitions provided in the PIA.
Why are PIAs important?
PIAs are important because they provide unique type identity. The PIA distinguishes the official type definitions from counterfeit definitions provided by other interop assemblies. Having a single type identity ensures type compatibility between applications that share the types defined in the PIA. Because the PIA is signed by its publisher and labeled with the PrimaryInteropAssembly attribute, it can be differentiated from other interop assemblies that define the same types.
Another difference of a PIA over an interop assembly is that the COM library owner can customize the PIA by renaming or hiding certain elements in the assembly and add attributes to change marshaling behavior between .NET and COM. Herein lies the importance of using the Microsoft Office XP PIAs over one you would generate yourself with TLBIMP.EXE or Visual Studio .NET. Even though these tools described thus far do a good job in typical scenarios, it is not realistic to expect them to understand the numerous nuances of the deep and extensive Office object models. Microsoft has customized the PIAs to improve interoperability.
So in summary, that was the long version of convincing you that:
- You need an interop assembly for Office and .NET development.
- You should use the primary interop assemblies from Microsoft because they are official.
- The official PIAs are tweaked for better interoperability.
- Microsoft has tested the Office XP PIAs and they are likely to be better than ones you or I would generate on our own.
Installing the Office XP PIAs
After downloading the Office XP PIAs and extracting them to a directory, you will notice about 43 files. Most of them are files with a .DLL extension. These files are the actual interop assemblies. Additionally, there are a number of files to help you install the PIAs into the .NET global assembly cache along with a Readme file that discusses the installation process. The Office XP PIAs include support for Microsoft Access, Microsoft Excel, Microsoft FrontPage®, Microsoft Outlook®, Microsoft PowerPoint®, Microsoft Publisher, Microsoft Visio, Microsoft Word, and other supporting libraries used by Office.
I encourage you to read the article Working with the Office XP Primary Interop Assemblies. This article thoroughly covers installing the PIAs and various options for distribution. I do not want to rehash the details that are well covered in that article, but I will review a few key points.
You have a couple options when installing the PIAs . First, you can install the PIAs into the .NET global assembly cache (GAC). The GAC is a machine-wide repository for assemblies that are shared by multiple applications. The GAC is found in the WINNT\Assembly or WINDOWS\Assembly directory. A Microsoft Windows® shell extension provides us with a specialized view of this directory as shown in Figure 1.
Figure 1. The global assembly cache
Once an assembly is in the GAC, this assembly is available to any application (assuming the security settings permit it). Another option is to use what has been affectionately called the XCOPY approach. The XCOPY approach means that you simply copy the necessary PIA files into your project directory, set a reference to those PIA files, and you are ready to go. The XCOPY approach is one benefit of .NET over typical COM development. Once your application is completed, you simply include them with your application in the same directory in your deployment package.
Which approach to using the PIAs should you use? As with most things, it depends on your needs. Most enterprises will probably opt to deploy the Office XP PIAs to the GAC. This allows them to manage the deployment process and to leverage the assemblies in a shared way. This approach will simplify deploying applications because developers will not have to incorporate the PIAs into their install process. Additionally, this prevents disk space from being wasted because multiple copies of the assembly are not being deployed. I should mention though, even if this does happen, .NET assures that we will not have the old DLL hell problem of component conflicts.
For many scenarios, deploying the Office XP PIAs to the GAC will be the preferred method. However, this does not rule out the value of using the XCOPY approach. There are many time a developer does not have control over the desktop and cannot assure that the needed PIAs will be in the GAC. In addition, many companies do not appreciate third party applications making global system changes to their users' computers. Therefore, it may not be realistic for your setup package to install the Office XP PIAs into the GAC. In this case, where you cannot control the desktop and proactively want to mitigate the risk that the PIAs may not be on the target machine, you can include the individual PIA files needed for your solution with your application. It is refreshing to have flexible options for deploying the PIAs.
Add a Reference
In Visual Studio .NET, we need to add a reference to an assembly if we want to use it in our application. To add a reference to one of the Office COM objects, we can select the Project menu in Visual Studio.NET and then click Add Reference. The Add Reference dialog box can also be accessed by right mouse clicking on the References node of our application in the Solution Explorer window as shown in Figure 2.
Figure 2. Adding a reference to the solution
By selecting Add Reference, the Add Reference dialog box appears as shown in Figure 3.
Figure 3. The Add Reference dialog box
This dialog box is similar to the Add Reference dialog box we use in the Visual Basic 6.0 or VBA IDE in that it allows you to set a reference to external libraries to use in a project. The key difference in Visual Studio .NET is that you can set it not just to a COM library, but also to a .NET assembly.
When I first used this dialog box as shown in Figure 3, I felt confident I knew what it was doing, but it did take me a few attempts to get it clear in my mind. When you select a library in the .NET tab, you are setting a reference to a .NET assembly. I assumed that after I installed the Office XP PIAs into the GAC that they would appear under the .NET tab; they don't! To reference a COM object, you need to use the COM tab. In Figure 3, you can see that I am referencing the Microsoft Word 10.0 Object Library. When I click OK, Visual Studio .NET will look into the GAC to see if I have an interop assembly installed for the Microsoft Word 10.0 Object Library. If I do, the interop assembly in the GAC is referenced. If an interop assembly does not exist, Visual Studio .NET will then generate an interop assembly using the mechanism described earlier.
If you want to use the aforementioned XCOPY approach, copy the needed PIA assembly files into your project directory. Then use the Add Reference dialog box and click on the Browse button. Browse to the PIA assembly files in the project directory.
You can tell if a reference has been set to the GAC or to a local copy of the interop assembly by looking at the Path property in Visual Studio .NET for the referenced interop assembly. For example, after setting a reference to the Microsoft Word 10.0 Object Library with an interop assembly in the GAC, the Path property is set to:
Note The line break in the path above was inserted for readability.
When I used the XCOPY approach, the path property had the following value:
Sample #1: Using the Word Spell Check Engine
One popular request I get often and have observed in the newsgroups is the desire to leverage the spell check engine built-in to Office. Leveraging the Office spell check engine makes a lot of sense. First, if your users have Office on their machine, they already have a great spell check engine. This engine supports custom dictionaries from third parties and allows the user to extend the vocabulary from the Office user interface. The Office spell check engine is also available through the Office object model that the Office XP PIAs expose. The following code demonstrates a .NET console application that checks the spelling of a sentence from the command line:
Module SpellCheck Sub Main() Dim str As String = Trim(Replace(System.Environment.CommandLine, _ "spellcheck", "", , , CompareMethod.Text)) Dim aWords As Array = str.Split(Convert.ToChar(" ")) Dim sWord As String Console.WriteLine(vbCrLf & Chr(34) & str & Chr(34) _ & " is being spell checked. Please wait ....." & vbCrLf) 'Create new instance of Microsoft Word Dim objWord As New __ Microsoft.Office.Interop.Word.ApplicationClass() Dim objSpellingSuggestions As __ Microsoft.Office.Interop.Word.SpellingSuggestions Dim objSpellingSuggestion As __ Microsoft.Office.Interop.Word.SpellingSuggestion 'Word requires a document to be open to use the 'spell check engine. Add a blank document. objWord.Documents.Add() 'Iterate through each word For Each sWord In aWords 'Spell check the word objSpellingSuggestions = objWord.GetSpellingSuggestions(sWord) If objSpellingSuggestions.Count > 0 Then 'If a word is misspelled, suggest possible corrections Console.WriteLine(Chr(34) & sWord & Chr(34) _ & " is mispelled. Suggested corrections: ") For Each objSpellingSuggestion In objSpellingSuggestions Console.WriteLine(Space(2) & _ objSpellingSuggestion.Name) Next Console.WriteLine() End If Next 'Close Word down objWord.Quit() Console.WriteLine("Spell check is complete") End Sub End Module
Figure 4 shows this command line utility in operation. Obviously, the utility is a contrived sample, but using a console application allows us to simplify the code and focus on what we are interested in, namely automating Office with Visual Studio .NET.
Figure 4. The SpellCheck command line utility output
Looking at the code for this sample, we see the following logic:
- Access the command line for the string of text to be spell checked. Microsoft .NET makes this easy to do as the command line parameters are available through System.Environment.CommandLine.
- After stripping out unwanted characters, the string is converted into an array with each word in the string becoming an element of the array.
- Next an instance of Microsoft Word is created:
Dim objWord As New Microsoft.Office.Interop.Word.ApplicationClass()
- Microsoft Word requires a document to be loaded into the application to use the spell check engine. This is true even if you don't plan to do anything with the document. Therefore, the code calls the Application.Document.Add method and adds a blank document to Word.
- Next, each word in the array is stepped through and passed into the spell check engine through the GetSpellingSuggestions method. The GetSpellingSuggestions method returns a collection of SpellingSuggestion objects if a word is incorrectly spelled.
- For each incorrectly spelled word, the command line utility outputs a message to the console along with suggestions for correction.
A few observations about the code—if you're comfortable with the Office object model, a fair amount of this code will look familiar. This means that your experience with Office automation will apply to working with the Office XP PIAs. In many ways, the experience is reminiscent of automating Office from Visual Basic, in that once you get an object reference to an Application object for the host application, you can then walk the object model to access the objects, methods, properties, and events of interest.
You may also notice that when creating the instance of Microsoft Word that the object being created is of type Word.ApplicationClass. Shouldn't this be Word.Application? When I first started writing this code I was using Word.Application and everything was working well until I tried to call
objWord.Quit. After typing in
objWord.Quit, Visual Studio .NET reported that there was an error:
'Quit' is ambiguous across the inherited interfaces
What? I had no clue what this meant. Well, after a little digging around (about an hour later), I could see that two object members had the same method name and they were conflicting and that it only worked if I used Word.ApplicationClass. It just so happens that the Office gurus here at MSDN have documented this issue along with a number of others you need to know about in an article entitled Office XP Primary Interop Assemblies Known Issues. This article explains that "objects ending with the suffix Class should only be used when you are working with events or object members with ambiguous names. Otherwise, use objects that do not end with the suffix Class." I wish I had known that when I started.
The other problem I ran into that is covered by the article Office XP Primary Interop Assemblies Known Issues was that suddenly the Office XP PIAs installed in the GAC were not referenceable from my Visual Studio .NET project. Apparently, this can happen when a user or any of the Office applications initiate an Office setup action that causes the Office type library registration to be rewritten. The solution to this issue is to rerun the register.bat file included with the Office XP PIAs.
These are two issues out of a handful of oddities that I hope the Office team will address in future versions of the PIAs for Office.
Sample #2: Consume ADO.NET Datasets, Part 2
In the August Office Talk column, I wrote about a technique for Working with ADO.NET Datasets in Microsoft Office made available from a .NET XML Web service. The approach in that column featured a couple hundred lines of code in Visual Basic 6.0 using Microsoft XML Core Services 4.0. This is a good solution when the .NET Framework is not available on the target computer.
Now that the .NET Framework is popping up on my customer's computers and the Office XP PIAs were available, I was eager to convert this code to Visual Basic .NET. To my surprise, Visual Studio .NET allowed me to do this in about 44 lines of code after creating a Web reference to an XML Web service (the same XML Web service referenced in my August Office Talk column).
Let's take a look at the code:
Imports Microsoft.Office.Interop Module DataSet2Excel Sub Main() Dim ds As DataSet Dim dt As DataTable Dim ictrColumn As Integer Dim ictrRow As Integer 'Create instance of Excel Dim xl As New Excel.Application() xl.Workbooks.Add() xl.Visible = True xl.Range("A2").Value = "Retrieving Web Service Data. Please wait.." Try xl.ScreenUpdating = False 'Retrieve data from web service into a DataSet Dim objOrdHist As New com.fabrikam.services.OrderHistory() ds = objOrdHist.OpenOrders dt = ds.Tables(0) 'Insert Column names into Excel For ictrColumn = 0 To dt.Columns.Count - 1 xl.Range("A1").Offset(, ictrColumn).Value = _ dt.Columns(ictrColumn).ColumnName Next 'Insert DataSet data into Excel For ictrRow = 0 To dt.Rows.Count - 1 xl.Range("A2").Offset(ictrRow). _ Resize(1, ictrColumn).Value = _ dt.Rows(ictrRow).ItemArray() Next 'Format the data list in Excel With xl.Sheets("Sheet1").Range("A1") .AutoFilter() .AutoFormat(Excel.XlRangeAutoFormat. _ xlRangeAutoFormatList2) End With Catch e As Exception MsgBox(e.ToString, MsgBoxStyle.Critical, _ "Data2Excel Error") xl.ActiveWorkbook.Close(False) xl.Quit() Finally xl.ScreenUpdating = True End Try End Sub End Module
Again, I am using a .NET console application for convenience. The first line of the code creates a shortcut reference that imports namespace names from referenced projects and assemblies using the Imports keyword:
This makes it easier to reference object types from within the source code module and can make the source code easier to read. For example, I first create an instance of Microsoft Excel with:
Dim xl As New Excel.Application()
Without using the Imports keyword with a reference to the library, the Microsoft Excel instance would read:
Dim xl As New Imports Microsoft.Office.Interop.Excel.Application()
It still works, but gets very lengthy. As a comparison, review sample #1, in which Imports is not used. Looking at the code further, a new workbook is added to the instance of Microsoft Excel. Before disabling screen updating (to prevent screen flicker as data is inserted into the workbook), a message prompting the user to wait is inserted into the workbook:
xl.Range("A2").Value = "Retrieving Web Service Data. Please wait.."
This temporary message will be overwritten when the data from the XML Web service is inserted into the workbook. Figure 5 shows the output at this point in the code.
Figure 5. Microsoft Excel prompting user to wait while data is retrieved.
Next, the sample code fills a DataSet object with the results of data made available from the XML Web service. Notice that there is no XML parsing code to convert the XML Web service data to a dataset. Microsoft .NET supports remoting datasets through an XML Web service and is able to serialize and deserialize the dataset as XML. Then using the familiar Microsoft Excel Range object, the code inserts the column headings and associated data by looping through the table of data contained in the dataset. Finally, the workbook is formatted for better presentation as shown in Figure 6.
Figure 6. Workbook populated with data from an ADO.NET DataSet from an XML Web service
The combination of Microsoft Office and the .NET Framework has a lot of exciting potential. The Office XP PIAs are the first step in what I hope will be a beneficial relationship between Office and .NET for years to come. Take the time to start working with the Office XP PIAs and to read the other articles referenced in this article.
- Microsoft .NET/COM Migration and Interoperability
- Calling a .NET Component from a COM Component
- Calling COM Components from .NET Clients
- .NET Framework Developer's Guide: Design Considerations for Interoperation
Finally, I want to thank Paul Cornell and Siew-Moi Khor of Microsoft for their assistance with the Office XP PIAs while researching this article.
Chris Kunicki works with customers, architects, and engineers to build cool desktop, enterprise and Web applications at OfficeZealot.com. Chris is a long time enthusiast of Office development and has been evangelizing Office as an important platform for building solutions by writing and speaking to users and developers. You can reach him at email@example.com. Check out his slant on things at http://www.officezealot.com.