Using Office Open XML to Customize Document Formatting in the 2007 Office System
Summary: Introduces and demonstrates how to apply custom formatting not available from within the user interface (UI) to a document in the 2007 Microsoft Office system by using Office Open XML. (8 printed pages)
Stephanie Krieger, arouet.net, Microsoft Office MVP
Applies to: Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007, Microsoft Office Word 2007, Microsoft Office for Mac 2008
If you are a power user who creates documents by using Microsoft Office PowerPoint 2007, Microsoft Office Word 2007, and Microsoft Office Excel 2007 or Microsoft Office for Mac 2008, you most likely use features like document themes and OfficeArt graphics (such as SmartArt graphics, Excel 2007 charts, and shapes in PowerPoint 2007 and Excel 2007) all the time. And, if you're like me, you probably love what you can do with those powerful, flexible tools, but still run into situations where you want even more flexibility.
If you find yourself wanting to customize graphics and theme-aware formatting in ways that don't appear to be available, don't give up hope. After all, only so many options can appear in the user interface (UI) of any software program.
As a Microsoft Office power user, one of the best aspects of the Office Open XML Formats is that you can access and edit almost every bit of document content and formatting from within the XML. So, you often get additional formatting flexibility by editing the XML directly. For example, you only have to edit a value or two in the XML to easily work around any of the following apparent limitations:
You need a reflection height that's not available in the Reflection effects galleries for shapes, pictures, or text.
You want a tint or shade of a theme color other than what is available from the Theme Colors gallery. However, you can't set a custom color in the Colors dialog box because you need the color to remain responsive to theme switching.
You would like to use a different shape throughout your SmartArt graphic. But, you can't use the Change Shape feature to accomplish that task because you need that graphic to continue to use the new shape if edited or reset.
This article walks through the preceding examples to show you how to find, interpret, and customize formatting in your documents.
If you have not yet edited Office Open XML, the tasks in this article are probably easier than you expect. However, the article does assume that you have some familiarity with an Office Open XML Format document ZIP package, including what a ZIP package is and how to access the XML document parts that make up your Microsoft Office documents, presentations, and workbooks. Additionally, the article assumes that you have experience using and customizing OfficeArt graphics and document themes. For help filling in gaps on any of these topics, or to review some of the basics, see Additional Resources.
Microsoft Office for Mac 2008 uses the same file formats and most of the same formatting capabilities as the 2007 Office system. So, much of the content in this article can apply to documents created in Office for Mac 2008 as well. However, note that there are some differences in formatting capabilities. For example, you can customize reflections customized from within the UI in PowerPoint 2008. However, Excel 2008 does not support theme-aware formatting for worksheet content other than charts and graphics.
If you are new to editing Office Open XML, you might be wondering how long it will take you to learn enough about the language and structure to make even simple formatting changes. Well, if you are an experienced user of the 2007 Office system programs (or Office 2008 for Mac), you are likely to be very pleasantly surprised.
Say, for example, that you're working on a PowerPoint slide and the reflection height that you need for the rounded rectangle shape shown in Figure 1 is not available in the Shape Effects Reflection gallery. You can't customize reflection settings in PowerPoint 2007, so you venture into Office Open XML.
When you open the ZIP package, navigate to the slides subfolder (inside the ppt folder) and open the XML document part for the slide containing the shape you want to edit. Within the XML for that slide, you see the markup shown in Figure 2.
In the top of the markup outlined here, notice that you see the <p:sp> tag, referring to a PowerPoint shape. This figure shows just part of the markup for the shape, as you can tell because you do not see the </p:sp> end tag anywhere in this example. Let's take a closer look at the different sections of markup that are called out in this figure:
The <p:nvSpPr> tag refers to the non-visible shape properties. Notice the name attribute shown here. You can save time searching through the XML document part for the markup you need by getting that shape name while the presentation is still open in PowerPoint 2007. Then, just use the Find feature in whatever program you use to edit the XML (whether it's Windows Notepad, Microsoft Visual Studio 2008, or anything in between) to go directly to the markup for the shape you want to edit.
In PowerPoint 2007 or Excel 2007 you can select the shape and then open the Selection pane to see that shape's name. The Selection pane is available from the applicable contextual Format tab (such as Drawing Tools Format or Picture Tools Format) in the Arrange group.
Directly above the number 2 in the diagram, notice the <p:spPr> tag, indicating the beginning of the shape properties. The first properties shown here are those in the <a:xfrm> tag (OfficeArt transform), which include the position (<a:off> tag) from left (x attribute) and top (y attribute) of the slide and the size (a:ext) shown as width (cx) and height (cy).
The size and position values, like many OfficeArt values in Office Open XML, are measured in English Metric Units (EMUs). For information about this and other units of measure used in Office Open XML document markup, see How Many Units of Measure Are Enough?.
The <a:prstGeom> tag (preset geometry) shown here indicates the type of shape. The <a:avLst> tag (adjust value list) is empty because the shape's adjustment handles were not used to alter the shape. If the adjustment handles are used, a value appears for each handle that is moved.
The <a:gradfill> tag (gradient fill) contains a gradient stop list (<a:gslist>) which contains color information for each gradient stop (<a:gs>).
The <a:noFill> tag within the <a:ln> tag indicates that there is no outline on this shape.
The <a:reflection> tag is nested within the <a:effectLst> tag. Glow, shadow, reflection, and soft edges effects appear nested in the effect list tag when applicable. (The bevel formatting that you see on the shape in Figure 1 is in a separate tag that appears below the portion of markup shown in Figure 2.)
The attributes shown in this reflection tag include, in order, the blur radius, start alpha, end alpha, and end position. Additional available attributes for this tag that are not shown here include distance (from shape) and rotation and alignment settings.
Alpha is % opacity, which is the way that transparency is displayed in Office Open XML markup. Percent opacity is the opposite of the transparency value (for example, 40% transparency is 60% alpha).
The end position is the attribute to edit when you want to change the height of the reflection. In this case, the end position is set to display 38.5%of the original shape in the reflection (displayed as 38500).
The reflection attribute values, and other percent values earlier in this markup (such as gradient stop positions and color luminance adjustments), are displayed as 100,000 times the actual value.
After stepping through this markup, you can quickly see how intuitive the markup can be when you already know the features available to your Microsoft Office documents.
When you first start exploring Office Open XML document markup, it may seem like there's a different unit of measure for every value. Actually, there is really just a handful of measurement units used and they're easy to recognize when you know the document content. Following are common units of measure used in the XML markup of PowerPoint, Word, and Excel documents:
EMUs are used in quite a bit of format settings in the XML for OfficeArt graphics (known as DrawingML). One inch is equal to 914,400 EMUs; one point is equal to 12,700 EMUs.
Many percent values are displayed in DrawingML as 100,000 times the actual value, including some of the values noted earlier from Figure 2.
DrawingML markup for OfficeArt uses the same tags, attributes, and units of measure whether the object is in a document in Word, Excel, or PowerPoint. However, DrawingML is not used for shapes in Word. As you may notice when working in Word, shapes in that program do not support the formatting available to OfficeArt in the 2007 Office system. . (Note also that this is the same for Office 2008 for Mac.)
Many values are also displayed in the XML for PowerPoint presentations (known as PresentationML) as 100,000 times their value in the XML.
Values for some text formatting in PresentationML, such as font size, are displayed as 100 times their actual value.
Font size in the XML for Word documents (WordprocessingML) is displayed as twice its actual value. However, most document formatting in WordprocessingML is represented in twips. A twip is 1/20th of a point.
In a refreshing twist, many values in the XML for Excel workbooks (SpreadsheetML) are displayed as their actual value.
When you first start editing Office Open XML document markup, the easiest way to determine which unit of measure is used for a particular type of value is to simply take note of the actual value displayed in the UI and compare the two. For example, in Word, a one-inch page margin is displayed in the XML as 1440. If you know that there are 72 points in an inch, you can quickly see that 1440 is 20 times 72, indicating that margins are measured in twips.
If you want colors in your Microsoft Office documents to respond when you switch themes, you know that you have to use theme colors. But, what you may not know is that you are not limited to the tints (lighter variations) and shades (darker variations) that appear in the Theme Colors palette.
Tints and shades of theme colors are created by adjusting the luminance of the color. So, while you have tints and shades automatically generated for your theme colors in the Theme Colors palettes throughout Word, Excel, and PowerPoint, you can further customize the tint or shade applied to specific content by adjusting the associated luminance in the XML. The attributes to edit vary by program and type of content.
PowerPoint and DrawingML content use the same approach to color adjustment. Notice the <a:lumMod> (luminance modification) tag and <a:lumOff> (luminance offset) tag shown in the gradient fill settings in Figure 2.
The lumMod value is the percent luminance. A lumMod value of "60000" as shown here, is 60% of the luminance of the original color.
When the color is a shade of the original theme color, the lumMod attribute is the only one of the tags shown here that appears. The <a:lumOff> tag appears after the <a:lumMod> tag when the color is a tint of the original. The lumOff value always equals 1-lumMod, which is used in the tint calculation provided later in this article.
You can find a brief glossary of the tags that you can use to modify colors (color transforms) in DrawingML in the Standard ECMA-376. A fantastic, searchable resource is available for the language reference through the Document Interop Initiative. Note that this link goes directly to the color transform glossary, but you can search for any tag or attribute from this site.
For text in Word (such as to add a custom theme color tint to a paragraph style), color values are displayed including the current actual color value (in Hex encoding) based on the active theme in the document (w:val attribute), the w:themecolor attribute to specify the theme color used, and the w:themeTint attribute or w:themeShade attribute for modification.
If a themeColor value is present, it overrides the w:val setting, so you don't need to update the w:val attribute in the tag. The value updates to display the new current value the next time the document is saved in Word.
The w:themeTint attribute or w:themeShade attribute are displayed as a Hexidecimal encoding of the tint or shade value (between 0 and 255). For example, the value of 66 shown here represents a 40% tint. The value to display is calculated as follows:
0.4 * 255 = 102. The Hex value for 102 is 66.
Many resources are available to convert between Hexidecimal and Decimal values. For example, Excel 2007 provides the HexToDec function and DecToHex function. Or, you can use Scientific mode in the Windows Vista calculator accessory program for Hexidecimal\Decimal conversions.
For Excel worksheet cell content (such as text color and cell shading), color settings are found in the styles.xml part and displayed as you see in the following line of code:
The tint attribute is used regardless of whether the color is a tint or a shade. Shades are represented as negative values (such as "-0.4" for a 40% shade).
Ever precise as Excel is, the value shown in the sample code here is for the 40% tint displayed in the Theme Colors palette. Also note that the theme color integer begins counting from left to right in the palette starting with zero. Theme color 3 is the dark 2 text/background color.
Despite having different ways to display the tint and shade percentages, all of the programs use the same method to calculate the resulting color. Convert the original RGB value to HSL (a few free RGB\HSL converters are available on the Internet) and then adjust the luminance (L) with one of the following equations before converting the HSL value back to RGB. (The % tint in the following equations refers to the tint, themetint, themeshade, or lumMod values, as applicable.)
For a shade, the equation is luminance * %tint.
For a tint, the equation is luminance * %tint + (1-%tint). (Note that 1-%tint is equal to the lumOff value in DrawingML.)
Use an HSL conversion method that provides saturation and luminance values in percents values between 0 and 1. Also notice, as mentioned earlier, that 1-%tint is equal to the lumOff value in DrawingML.
Editing the default shape type for a SmartArt graphic is an ideal task to conclude this article because it gives you a chance to see how easily you can apply what you've learned about reading and editing formatting markup.
SmartArt graphics are stored in a document ZIP package in a diagrams subfolder within the main document folder. For each SmartArt graphic in your document, you see numbered files for colors, data, drawing, layout and quick style. For example, the layout1.xml file contains the layout information for the first SmartArt graphic added to the document.
Say, for example, that the first SmartArt graphic in your document uses the Basic Block List layout. That layout uses rectangles, but you need it with rounded rectangles as shown in Figure 3.
If you use the Change Shape feature in the UI to change those blocks from rectangles to rounded rectangles, the new shapes that you add to the graphic are still rectangles. Instead, do the following:
In the ZIP package for your document, navigate to the layout1.xml file as described earlier and open that file for editing.
Find the <dgm:shape> tag that contains the type attribute. In this example, it looks like the following:
Change the value from rect to roundRect. Then save the XML part, place it back into the ZIP package if you had to extract it to edit it, and then open the document in its native program.
That's all there is to it. You basic block list now looks like the diagram shown in Figure 3.
So, how did I know that the correct terminology for the rounded rectangle value is roundRect? I inserted a rounded rectangle onto a slide and then looked at its XML markup. Adding formatting to a document and then examining its XML is the easiest way to learn more about formatting your documents in the XML. For example, ever wonder how those crafty folks on the Microsoft Office product team created some of the Picture Style shadows that you can't reproduce in the Format Picture dialog box? Apply the style and check out the XML for yourself.
Office Open XML II: Editing Documents in the XML
Stephanie Krieger is a Microsoft Office System MVP and the author of two books, Advanced Microsoft Office Documents 2007 Edition Inside Out and Microsoft Office Document Designer. As a professional document consultant, Stephanie helps many global companies develop enterprise solutions for Microsoft Office on both platforms. She also frequently writes, presents, and creates content for Microsoft. You can reach Stephanie through her blog, arouet.net.