Converting Microsoft Office Binary Files to Open XML Formats
Introduces tools, strategies, and resources for converting Microsoft Office binary files to Open XML formats.
Dernière modification : lundi 9 mars 2015
S’applique à : Excel 2010 | Office 2007 | Office 2010 | Open XML | PowerPoint 2010 | SharePoint Server 2010 | VBA | Word 2010
Dans cet article
Overview
Methods of Conversion
Conversion Tools
Code-Based Solutions
Conclusion
Additional Resources
Applies to: Microsoft Office Excel 2003 | Microsoft Excel 2002 | Microsoft Excel 2000 | Microsoft Excel 97 | Microsoft Office PowerPoint 2003 | Microsoft PowerPoint 2002 | Microsoft PowerPoint 2000 | Microsoft PowerPoint 97 | Microsoft Office Word 2003 | Microsoft Word 2002 | Microsoft Word 2000 | Microsoft Word 97
Contents
Overview
Methods of Conversion
Conversion Tools
Office File Converter
Office Compatibility Pack
Code-Based Solutions
Comparison of Code-Based Conversion Technologies
Office Automation
Word Automation Services
Converting by Reading the Binary Data
Conclusion
Additional Resources
Overview
Converting Microsoft Office documents from binary formats such as .doc, .ppt, and .xls to Open XML formats can improve compatibility with newer versions of Office and Office-compatible software. Converting can also improve compliance with official standards and simplify interoperability with internal applications.
You can convert individual files by opening them in a current Microsoft Office application and saving them in the new format but this may be cumbersome for large file sets. To convert groups of files, you can use existing tools or create your own tools. This article describes both options.
Methods of Conversion
Most binary-to-XML conversions of Microsoft Office files can be done with existing tools. This is by far the easiest approach. If you need to convert a group of documents quickly, you should use a tool that has a command-line interface.
Conversion Tools
Office File Converter
You can use the Office File Converter (OFC), which is part of the 2007 Microsoft Office System Migration Guidance: Microsoft Office Migration Planning Manager, to translate groups of documents from binary to Open XML formats. To run Office File Converter, customize the ofc.ini file to point to the directory with the files to convert, then run ofc.exe. If you use an alternate .ini file, include the path to the file as a command parameter. For more information, see the documentation that comes with the Office File Converter download, or read the blog post Converting Office documents to Open XML.
Office Compatibility Pack
You can use the Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint File Formats if you are using Microsoft Office 2000, Microsoft Office XP, or Microsoft Office 2003 to convert your own individual files to Open XML formats. The Office Compatibility Pack does not support bulk conversions.
You can also use the Office Compatibility Pack to open, edit, and save files from other users that are already in Open XML formats.
Code-Based Solutions
You can write your own binary-to-Open XML conversion tools by using Office Automation, Microsoft Word Automation Services, or by working directly from the binary file format specifications. The following table shows the advantages and limitations of each technology.
Notes
The Kit de développement Open XML SDK 2.0 pour Microsoft Office is not included in the table because it does not work with binary files.
Comparison of Code-Based Conversion Technologies
Technology |
Runs On |
Advantages |
Limitations |
---|---|---|---|
Office Automation |
Client |
|
|
Microsoft Word Automation Services |
Microsoft SharePoint Server |
|
|
Binary File Format specifications |
All |
|
|
Office Automation
Office Automation is a framework and object model available through Visual Studio Tools for Office (VSTO) that enables developers to programmatically run Office client applications and execute commands within those applications. It is easy to use and well-suited to batch conversions. It requires loading files in the client application, which may limit performance over large document stores.
The following code, run from within a VSTO application, opens a binary Word document and saves it in Open XML format with a new name.
this.Application.Documents.Open(@"C:\BinaryDocument.doc");
this.SaveAs(@"C:\OpenXmlDocument.docx", WdSaveFormat.wdFormatXMLDocument);
For more information about Office Automation and file conversion, see the following resources.
[T:Microsoft.Office.Interop.ExcelNamespace:WorkbookInterface]
[T:Microsoft.Office.Interop.PowerPointNamespace:PresentationInterface]
Word Automation Services
Microsoft Word Automation Services is a feature of Microsoft SharePoint 2010 that emulates some aspects of Word Automation on the server. You can use Word Automation Services to convert files between supported formats across a SharePoint farm. Supported formats include .pdf, .rtf, .doc, and .docx.
The following example loads a binary .doc file and creates an Open XML .docx file with the same content. It does not delete or overwrite the original.
ConversionJob job = new ConversionJob("Word Automation Services");
job.Name = "Binary to OpenXML";
job.UserToken = myWebsite.CurrentUser.UserToken;
job.AddFile(fileName, fileName.Replace(".doc", ".docx"));
job.Start();
For more information about Word Automation Services, see the following resources.
ConversionJobClass
Converting by Reading the Binary Data
In the rare case where none of the existing tools or development platforms meet your needs, you can work directly from the binary data as specified in the Microsoft Office File Format Documents. First, create a model of the relevant Open XML schema in memory, then read the binary file stream and extract the data to fill in the schema piece by piece.
Conclusion
Whether you are trying to improve interoperability, comply with standards, or simplify document access for internal applications, choosing the right tool or API will help you get it done.
Additional Resources
For more information, see the following resources: