Finding Graphics in a Binary PowerPoint MS-PPT File

Summary: Learn how to locate bitmaps, drawings, WordArt, and other static graphics in a binary MicrosoftPowerPoint (.ppt) file.

Applies to: Office 2007 | Office 2010 | PowerPoint | PowerPoint 2010 | VBA

In this article
Introduction
Structures and Procedures
Conclusion
Additional Resources

Published:   May 2011

Provided by:  Microsoft Corporation

Contents

  • Introduction

  • Structures and Procedures

    • To Extract Bitmap Images from a .PPT File

    • To Extract Drawings from a .PPT File

    • To Locate a Bitmap in a Slide

  • Conclusion

  • Additional Resources

Introduction

The MS-PPT binary file format (.ppt) is used by Microsoft OfficePowerPoint 2003, Microsoft PowerPoint 2002, Microsoft PowerPoint 2000, and Microsoft PowerPoint 97. Use the procedures in this article to extract images or drawings from a .ppt file, and to locate a bitmap in a slide.

Extracting images directly from the binary file lets you quickly scan many files for a particular image without opening the PowerPoint application. You can then remove the image completely, or replace it with another image of the same size, with minimal changes to the file. For example, you could strike all instances of a copyrighted image from a file set, or update all instances of a company logo.

Drawings are harder to replace than images; however, to know where an image appears in a slide, you must find the shape object that the image is anchored to. You can make property changes to drawings or shapes, such as editing the text of a piece of WordArt, with minimal difficulty if the drawings or shapes remain the same size in memory. Otherwise, you must update the record headers in the current edit to reflect their new memory allocations.

Note

The recommended way to perform most programming tasks in Microsoft PowerPoint is to use the PowerPoint Primary Interop Assemblies. These are a set of .NET classes that provide a complete object model for working with Microsoft PowerPoint. This article series deals only with advanced scenarios, such as where Microsoft PowerPoint is not installed.

Structures and Procedures

All vector-based graphical elements in a .ppt file are stored as drawings inside DrawingContainer objects in the PowerPoint Document stream. These elements include clip art, WordArt, and any drawings or diagrams that consist of scalable shapes and lines. Bitmaps are stored centrally as binary large images or pictures (BLIPs) inside the Pictures stream, and referenced by the drawings where they appear. Both drawings and BLIPs use the [MS-ODRAW]: Office Drawing Binary File Format Structure Specification file format, which is the shared graphical format for Microsoft Excel, Microsoft PowerPoint, and Microsoft Word.

Note

All of the records in a PowerPoint document begin with an 8-byte record header, unless stated otherwise. The third and fourth bytes show the record type, and the last 4 bytes show the length of the record. You can use this information to identify records of interest and skip over the rest.

To Extract Bitmap Images from a .PPT File

  1. Open the Pictures stream.

    This stream contains any embedded bitmap images the user has copied into the file, as a series of OfficeArtBStoreDelay records. An OfficeArtBStoreDelay record is a pure array of OfficeArtBStoreContainerFileBlock records, and has no record header or other fields.

  2. For each OfficeArtBStoreContainerFileBlock record in the array, do the following:

    1. Read bytes 2 and 3 of the record header to get the record type.

    2. If record type = OfficeArtBlip(0xF018-0xF117), continue to the next step in this procedure.

      If record type = OfficeArtFBSE(0xF007), do the following:

      1. Skip the first 20 bytes.

      2. Read the next 4 bytes, which show the size of the bitmap as an unsigned integer.

      3. Skip the next 12 bytes.

      4. Read the .name field, which is a variable length, null -terminated, Unicode string that shows the name of the bitmap.

        The next field is .embeddedBlip, which is an OfficeArtBlip record.

    3. Read the record header of the OfficeArtBlip record. Bytes 2 and 3 specify the file type the image would have if it were saved separately. The last 4 bytes of the record header show the length of the rest of the record. For more information about which type values correspond to which file types, see the [MS-ODRAW] specification, section 2.2.23.

    4. The rest of the OfficeArtBlip record is the actual bitmap image data. Save the image as whichever file type is specified by the record header.

To Extract Drawings from a .PPT File

  1. Create a persist object directory, as described in Understanding the PowerPoint MS-PPT Binary File Format, in the first part of the procedure titled "Retrieving Slides from PowerPoint Files."

  2. In the persist object directory, check the record headers at each specified offset, and read each record header.

    1. If rh.RecType equals RT_Document(0x03E8), this is the Document Container. Check the record header of each of its child containers until you find a record where rh.recType equals RT_DrawingGroup(0x040B).

      This is the drawing group container for the file. Note the location of this container.

    2. Where rh.recType equals RT_MainMaster(0x03F8) or RT_Slide(0x03EE):

      1. Check each child record for a record header where rh.recType equals RT_Drawing(0x040C).

      2. Parse that record as described in Understanding Graphics in Office Binary File Formats under the procedure titled "gg985447(v=office.14).md."

To Locate a Bitmap in a Slide

  1. Extract the bitmaps for the file, as described in the procedure "To Extract Bitmap Images from a .PPT File," and then record the position of each OfficeArtBStoreContainerFileBlock record in the OfficeArtBStoreDelay.rgfb array.

  2. Parse the drawings for the file as described in the procedure "To Extract Drawings from a .PPT File." If you know which slide contains the bitmap, you can parse just the drawings for that one slide.

  3. Check each shape for bitmaps.

    1. For each child of the OfficeArtSpContainer record that represents the current shape, scan the record headers for a record where rh.RecType = OfficeArtFOPT(F00B), which is the Shape Primary Options attribute.

    2. Read the rest of the OfficeArtFOPT record, which consists of a property table.

      The property table is of type OfficeArtRGFOPTE, and has no record header. It consists of an array of 6-byte OfficeArtFOPTE property table entries, followed by a variable-size field for complex data.

    3. Read each property table entry until you find one where the opid.fBid attribute at bit 14 = 0x1, and then read the next 4 bytes as an unsigned integer. This integer specifies the position in the OfficeArtBStoreDelay.rgfb array of the OfficeArtBStoreContainerFileBlock record that contains the corresponding bitmap.

      If none of the property table entries for a given shape have opid.fBid = 0x1, there are no bitmaps associated with that shape.

Conclusion

This article has discussed the basic processes for extracting pictures and shapes from a binary format PowerPoint (.ppt) file. By building on these processes, you will be able to scan large file sets for drawings and bitmaps, tag them, and even replace them with updated versions.

Additional Resources

For more information, see the following resources: