Using Office Open XML to Save Time Without Writing Code
Summary: This article explores the structure of an Open XML Format document ZIP package to introduce ways in which you can save time when editing documents by editing the file directly in the ZIP package—in some cases without even having to read or write any Open XML code. (8 printed pages)
Stephanie Krieger, arouet.net, Microsoft Office MVP
Applies to: Microsoft Office Word 2007, Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007 (Open XML information also applies to Microsoft Office 2008 for Mac programs Word, Excel, and PowerPoint)
You may already know that the Open XML Formats open a new world of extensibility opportunities that enable power users to take documents, presentations, and workbooks beyond the user interface of the 2007 Microsoft Office system programs. But did you know that you can reap some of those benefits without writing even one line of Open XML or VBA code?
In fact, you can use the Open XML Formats to save time on many different types of tasks just by understanding the construction of an Open XML Format ZIP package. For example, consider the following scenario:
You have a 100-page proposal about to go out the door when you learn that the client’s logo that you’ve used in the document is obsolete. That logo appears in the document dozens of times but the proposal is already overdue. Wouldn’t it be nice if you could just paste the updated logo once and Word 2007 would automatically replace it everywhere the original logo had been and even format it correctly throughout the document?
Okay, so you may be thinking that it would also be nice if your cat knew how to cook dinner, but that's equally unlikely. Well, Open XML can't send your cat to culinary school but you can indeed paste that logo once and let Word do the rest of the work. To accomplish this, you just have to paste it in the ZIP package rather than in the document.
This article introduces you to what you can (and can’t) do in a ZIP package without editing any Open XML and uses the preceding example to walk you through some hands-on practice.
Although this article covers fairly basic Open XML topics and is written for advanced users of the 2007 Office system programs, some familiarity with the concepts of the Open XML Formats is assumed. For help filling in gaps or reviewing basics, see Additional Resources. Also note that, although this article focuses on documents created using the 2007 Office system, the Open XML information in this article also applies to Microsoft Office 2008 for Mac, which uses the same file formats as Office 2007.
Even if you are not going to write any Open XML, it’s important to understand the structure of the ZIP package before you start to make changes in your own document ZIP packages. So, let's take a quick look.
When you explore the contents of Open XML Format ZIP packages for Word, PowerPoint, and Excel 2007 documents (as shown in Figures 1 through 3), you see that the package structures are very similar.
Note that, if you open the ZIP package of one of your own Open XML Format documents to explore its contents, you may see differences in the files and folders contained in the main document folder from those shown in Figures 1 - 3. For example, the media folder only exists in these three ZIP packages because a picture (or other type of media object) exists in the document. Similarly, if your document included charts or SmartArt diagrams, you would see folders for those objects as well. Your ZIP package may also contain other files (also known as parts), such as headers, footers, or comments in a Word document.
However, many elements of the ZIP package structures are identical. At the top-level of every Open XML Format ZIP package, you see a _rels (relationships) folder, a docProps (document properties) folder, and a main document folder named for the applicable program. This level of the structure also includes the [Content_Types].xml file, which provides content type definitions for the file formats and for many of the document parts included in the ZIP package structure.
In the main document folder for each package, you see a combination of folders and XML document parts, which vary by both the program and the file contents. However, notice also that each of the main document folders includes a _rels folder.
Your document always contains _rels or relationships folders on the top-level and in the main document folder, and may contain several others as well. This is because a relationship has to be defined for every file and any external references in your document. It is these relationship definitions that tell the ZIP package how the pieces fit together to form your document. So, adding, deleting, or renaming any file in your ZIP package always requires editing a relationship file.
To determine if you need to edit Open XML in order to make the change you want to make in a ZIP package, remember the following:
If you replace a file in an Open XML Format ZIP package with one of the exact same file name, type, and location within the package, you often do not need to edit any Open XML. However, you will still need to edit relationships if contents within the file that you replace require editing relationships (such as if you replace a slide layout XML part in a PowerPoint zip package with one that has additional images).
If you add a new file format that does not already exist in the ZIP package, you must add a content type definition to the file [Content_Types].xml. Many (but not all) types of files that you may add to a ZIP package require the addition of a content type definition as well.
Learn how to edit relationships and content type definitions later in this article.
To open an Open XML Format document ZIP package, you can either change the file extension of your document to .zip or you can use a utility that does not require you to change the extension in order to explore the package, which can be a nice timesaver. Several such utilities are available, including the Open Source program 7-Zip for Windows users or the BetterZip program for Mac users.
When your Office 2007 document includes a picture, that picture is only stored in the ZIP package once regardless of how many times it appears in the document. Therefore, you can replace the picture file once in the ZIP package to replace it everywhere the original appears in the document. In fact, because you only replace the image file, the image formatting throughout the document is retained as well.
Sound too easy? Must be a catch? Well, there is just one. As explained in the preceding section of this article, the new picture file format must be the same as the original, or you will need to edit a bit of Open XML. (Learn how to do that in the next section of this article.)
Before you edit a ZIP package, it's a good idea to always make a copy of the document. Remember that, if you make an error when replacing files or editing Open XML, the document may not be able to open.
To replace an image with another image of the same file type
In the document ZIP package, open the main document folder (i.e., word, ppt, or xl) and then open the media subfolder.
If your document contains more than one image, you will see a file in the media subfolder for each image. If so, determine which file is the one that you want to replace.
Change the name of the new image file that you want to replace in the ZIP package to match the name of the file being replaced exactly. For example, if the file in the ZIP package that you want to replace is image1.png, rename the file that you are going to replace it with to image1.png.
If the file type is the same but the extension is slightly different, match the extension to that of the file already in the zip package. For example, if the current image is image1.jpeg and your new image has the extension.jpg, change the extension to .jpeg so that the file names match precisely. However, if the file is a different format from the original, do not change the file extension. Instead, take the additional steps in the next section of this article to properly replace the file.
Drag the new image into the ZIP package, replacing the existing file of the same name.
Open your document and take a look. You will see that the image was replaced wherever the original image appeared and that all formatting has been retained.
If the new image has different proportions from the original, you may need to fix the proportions of the image wherever it appears in the document. This is because all formatting information is retained in the document for the original image, even the size. So, an image with different proportions may appear distorted. But, don't lose heart—there’s no manual labor required here. A very little bit of Microsoft Visual Basic for Applications (VBA) will fix that up in a jiffy. (Find the VBA for getting this done later in this article.)
If your image appears on a worksheet and in the header of an Excel workbook, this is an exception where you may need to replace the image file twice. Images inserted into Excel headers and footers are stored separately from images inserted into the body of an Excel sheet.
If the new image is a different file type from the original, start by following the steps in the preceding section to add the new image file to the ZIP package. Remember to delete the original image from the package, since it will not be replaced when you drag the new image into the package (because of the different file extension). Then, take the following steps to update the file format information and the image file name throughout the ZIP package:
If another image file of the same type does not already exist in the document, add a content type definition for the new image file type. To do this, open the file [Content_Types].xml and then copy and change an existing image file type definition. For example, if you already have a definition for jpeg, like so:
Duplicate it to create the definition for the png format as follows:
Even if no other images of the original file type (such as jpeg) exist in the document, there is no reason to delete the definition in this file for that extension.
Edit any relationship definitions for the image to use the new file extension. For example, if the original relationship looks like the following and your new image is image1.png, change the file extension where it is written in this relationship:
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/ officeDocument/2006/relationships/image" Target="../media/image1.jpeg"/>
So, where you do find the relationships that you need to edit? Well, you may need to edit relationships in one or more files, depending on where the image appears in the document and the type of document. For example:
If the image appears in the body of a Word document, the relationship to edit is in the file document.xml.rels that appears in the _rels folder within the word folder. However, if the same image also appears in one or more document headers, you'll see .rels files in the same folder for those headers and will need to edit the relationships there as well.
If the image appears on a PowerPoint slide layout, open the _rels folder inside the slideLayouts folder (which is inside the ppt folder) and then find the .rels file for the correct layout(s). Note that, if the image also appears on a slide or a slide master, you need to edit the .rels files in those folders as well.
If the image appears on an Excel worksheet, the .rels file to edit is in the _rels folder inside the drawings subfolder (within the xl folder). If the image also appears in a header or footer, you see two .rels files in this folder to update.
Files with the extension .rels are XML files. So, you can open and edit them in an XML editor. If you don't have an XML editing program, just use Windows Notepad to edit these or any XML document part. (Or, if you are familiar with the basics of XML structure and prefer to use a structured editor, try the free download XML Notepad 2007.)
That's all there is to it. Your new image is ready to go.
As noted earlier, if your new image has different proportions from the original, it may look distorted in the document. Of course, you can manually edit the proportions for each instance of the image using the Size dialog box (note that the name of that dialog box varies by program). But, if you wanted to do that much work you wouldn't be reading this article, right? So, instead, try just a bit of VBA to take care of all the images at once.
For example, the following code example is all you need to fix the proportions for all images throughout all parts of your Word document:
If you're new to VBA but like the sound of how much time you can save on tasks like this, see the Additional Resources at the end of this article for some basic VBA resources to help you get started.
Sub FixDimensions() Dim oRg As Range 'Declare a range variable to loop through all story ranges in the document (such as the body and headers) Dim oIp As InlineShape 'If your images have text wrapping enabled, use the Shape object instead of the InlineShape object here For Each oRg In ActiveDocument.StoryRanges For Each oIp In oRg.InlineShapes With oIp 'Set the scale so that the larger dimension uses the scale of the smaller. This is so that your new image fits in the allotted space. If .ScaleHeight < .ScaleWidth Then .ScaleWidth = .ScaleHeight Else .ScaleHeight = .ScaleWidth End If 'Set 'lock aspect ratio' to true to retain the new proportions. .LockAspectRatio = msoTrue End With Next oIp Next oRg End Sub
In this article, you've seen examples of simple and somewhat simple edits that you can make in the ZIP package to save time.
Of course, after you see how much time you can save (and how much fun you can have) exploring the XML under the hood of your documents, don't forget that the simplest solution to any task is always the best—and that isn't always found under the hood or in code. For example, if you need to replace an image with one of a different file format but the image only appears once or twice in the document, it may be faster to insert the new image into the document (and copy the formatting from the original image using Format Painter) than to spend time editing Open XML relationships.
That said, once you get comfortable editing the ZIP package, you may be surprised at just how much time you can save on many tasks by going under the hood. For example, the next time you need to update the theme or style definitions for a set of documents, remember that you can replace a file of the same name and type in the same location within your ZIP package without requiring additional edits (such as theme1.xml (in Word, Excel, or PowerPoint packages) or styles.xml (in Word packages).
So, dive into a ZIP package, explore, and see what else you can do. For more fun and easy examples of how you can use Open XML to get more from your Microsoft Office documents, see the articles Think Outside the UI: Using Open XML to Customize Formatting in the 2007 Office System and Getting More from Document Themes in the 2007 Office System with Office Open XML.
Stephanie Krieger is a Microsoft Office System MVP and the author of two books, Advanced Microsoft Office Documents 2007 Edition Inside Out and Microsoft Office Document Designer. As a professional document consultant, Stephanie helps many global companies develop enterprise solutions for Microsoft Office on both platforms. She also frequently writes, presents, and creates content for Microsoft. You can reach Stephanie through her blog, arouet.net.