Expand Minimize

Converting Microsoft Office Binary Files to Open XML Formats

Office 2010

  Introduces tools, strategies, and resources for converting Microsoft Office binary files to Open XML formats.

Converting Microsoft Office documents from binary formats such as .doc, .ppt, and .xls to Open XML formats can improve compatibility with newer versions of Office and Office-compatible software. Converting can also improve compliance with official standards and simplify interoperability with internal applications.

You can convert individual files by opening them in a current Microsoft Office application and saving them in the new format but this may be cumbersome for large file sets. To convert groups of files, you can use existing tools or create your own tools. This article describes both options.

Most binary-to-XML conversions of Microsoft Office files can be done with existing tools. This is by far the easiest approach. If you need to convert a group of documents quickly, you should use a tool that has a command-line interface.

Office File Converter

You can use the Office File Converter (OFC), which is part of the 2007 Microsoft Office System Migration Guidance: Microsoft Office Migration Planning Manager, to translate groups of documents from binary to Open XML formats. To run Office File Converter, customize the ofc.ini file to point to the directory with the files to convert, then run ofc.exe. If you use an alternate .ini file, include the path to the file as a command parameter. For more information, see the documentation that comes with the Office File Converter download, or read the blog post Converting Office documents to Open XML.

Office Compatibility Pack

You can use the Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint File Formats if you are using Microsoft Office 2000, Microsoft Office XP, or Microsoft Office 2003 to convert your own individual files to Open XML formats. The Office Compatibility Pack does not support bulk conversions.

You can also use the Office Compatibility Pack to open, edit, and save files from other users that are already in Open XML formats.

You can write your own binary-to-Open XML conversion tools by using Office Automation, Microsoft Word Automation Services, or by working directly from the binary file format specifications. The following table shows the advantages and limitations of each technology.

Note Note

The Open XML SDK 2.0 for Microsoft Office is not included in the table because it does not work with binary files.

Comparison of Code-Based Conversion Technologies

Technology

Runs On

Advantages

Limitations

Office Automation

Client

  • Easiest to use

  • Works with all formats

  • Requires client application to run

Microsoft Word Automation Services

Microsoft SharePoint Server

  • Easy to use

  • Can convert all documents across a SharePoint farm

  • Only works on Microsoft Word files

  • Requires SharePoint Server

Binary File Format specifications

All

  • Most flexible

  • Complex

  • High development cost

Office Automation

Office Automation is a framework and object model available through Visual Studio Tools for Office (VSTO) that enables developers to programmatically run Office client applications and execute commands within those applications. It is easy to use and well-suited to batch conversions. It requires loading files in the client application, which may limit performance over large document stores.

The following code, run from within a VSTO application, opens a binary Word document and saves it in Open XML format with a new name.

this.Application.Documents.Open(@"C:\BinaryDocument.doc");
this.SaveAs(@"C:\OpenXmlDocument.docx", WdSaveFormat.wdFormatXMLDocument);

For more information about Office Automation and file conversion, see the following resources.

Word Automation Services

Microsoft Word Automation Services is a feature of Microsoft SharePoint 2010 that emulates some aspects of Word Automation on the server. You can use Word Automation Services to convert files between supported formats across a SharePoint farm. Supported formats include .pdf, .rtf, .doc, and .docx.

The following example loads a binary .doc file and creates an Open XML .docx file with the same content. It does not delete or overwrite the original.

ConversionJob job = new ConversionJob("Word Automation Services"); 
job.Name = "Binary to OpenXML"; 
job.UserToken = myWebsite.CurrentUser.UserToken; 
job.AddFile(fileName, fileName.Replace(".doc", ".docx")); 
job.Start();

For more information about Word Automation Services, see the following resources.

Converting by Reading the Binary Data

In the rare case where none of the existing tools or development platforms meet your needs, you can work directly from the binary data as specified in the Microsoft Office File Format Documents. First, create a model of the relevant Open XML schema in memory, then read the binary file stream and extract the data to fill in the schema piece by piece.

Whether you are trying to improve interoperability, comply with standards, or simplify document access for internal applications, choosing the right tool or API will help you get it done.

Show:
© 2014 Microsoft