How to Use the Microsoft Office Binary File Format Validator

Summary: This technical article is provided as supplemental documentation for the Microsoft Office Binary File Format Validator Beta. This article presents two examples that show how to use Microsoft Office Binary File Format Validator Beta to find non-compliance issues with .doc, .xls, and .ppt binary format files.

Applies to: Excel 2010 | Office 2007 | Office 2010 | Office client | Open XML | PowerPoint 2010 | SharePoint Server 2010 | VBA | Word 2010

This article uses validation scenarios to explain how to use the Microsoft Office Binary File Format Validator Beta to validate certain binary files. In this article, the validation scenarios are for a .ppt file that fails Office Validation and for a .doc file that does not fail Office Validation, but contains a validation error that is found by the Microsoft Office Binary File Format Validator Beta.

The following file types are validated by Microsoft Office Binary File Format Validator Beta:

Extension

Targeted applications

.doc

Microsoft Word 97, Microsoft Word 2000, and Microsoft Office Word 2003. These file formats are also understood by Microsoft Office Word 2007 and Microsoft Word 2010.

.xls

Microsoft Excel 97, Microsoft Excel 2000, and Microsoft Office Excel 2003. These file formats are also understood by Microsoft Office Excel 2007 and Microsoft Excel 2010.

.ppt

Microsoft PowerPoint 97, Microsoft PowerPoint 2000, and Microsoft Office PowerPoint 2003. These file formats are also understood by Microsoft Office PowerPoint 2007 and Microsoft PowerPoint 2010.

Opening .doc, .ppt, and .xls Files

Let’s start by setting up the scenario. You are a developer for a company that is creating a suite of applications that read and write .doc, .ppt, or .xls files.

To develop those applications, you have carefully read and understood the binary file specifications produced by Microsoft. You are now testing the files that you produced by opening them with Word, PowerPoint, or Excel. These applications perform a validation step called Office File Validation when they open files. When a user opens a binary file format file, it is first validated so that a corrupted and potentially harmful file is not opened by a user.

This example uses the .ppt file from Microsoft PowerPoint, opened in Microsoft Office 2010. The first indication that you have created a .ppt file that contains validation issues occurs when you try to open it in a Office 2010 application and the file opens in Protected View. This also indicates that the code that you wrote to generate the file is creating content that does not comply with the binary file open specifications.

Protected View is available only in Microsoft Office 2010 and is indicated by the red banner with the text: Office has detected a problem with this file, Editing it may harm your computer. Click for more details.

ProtectedView_PPT

This message indicates that the file was opened in Protected View. However, the validation that is built into the application does not allow for troubleshooting the problem. If your users were to open the file and receive this message, they might find it inconvenient to open the file for write access.

In this scenario, your focus is still on the code that generated the file that failed Office File Validation. Your code is under development, so you would move into the next phase where you examine the file by using the Microsoft Office Binary File Format Validator Beta.

The purpose of Office File Validation is to protect binary files from security breaches and perceived malicious threats. If a file that passes Office File Validation can be opened, it does not mean that there is no issue in the file. If the issue is not a harmful threat to your Word, Excel, and PowerPoint files, then these applications and your application can be robust enough to fix or ignore any deviation that is not material enough to prevent the file from opening.

Using Microsoft Office Binary File Format Validator Beta

You can find documentation about how to use the Microsoft Office Binary File Format Validator Beta in the help that accompanies it. You can also find the Microsoft Office Binary File Format Validator Beta help in the Microsoft MSDN library: Introduction to Microsoft Office Binary File Format Validator.

In general, when you use Microsoft Office Binary File Format Validator Beta, you should progress by using the three phases shown in the following illustration:

BFFValidator_Process

The Microsoft Office Binary File Format Validator Beta stops at the first failure point it reaches. Therefore, you will need to re-validate the failure after you fix it to make sure that there are no additional errors.

Editing the File

Providing guidance about how to edit binary files is beyond the scope of this article. Understanding streams within the binary file and understanding offsets is required before you attempt to change the data in a given file that is failing the Microsoft Office Binary File Format Validator Beta scan.

In the examples in this section, you can see two examples of how to find the offset of a fault and see the illustration, in broad strokes, of how to fix a fault in the file itself. Use the tool that you prefer to read the binary file, but a tool that is stream-based is easier to use.

The following sections show how to find the failure point in a .ppt file that opens in Protected View, and in a .doc file that opens normally but contains a validation error after a Microsoft Office Binary File Format Validator Beta scan.

Although these examples shows how to edit a binary file by using an editor, it is important to remember that this a troubleshooting step. The scenario is still about how to develop an application that generates a binary file that complies with the specification.

Finding a validation error in a .ppt file that fails validation

  1. To validate the Sample1_Failing.ppt file, shown in the screenshot in the section "Opening a File in Protected View," type the following at the command prompt:

    C:\Program Files\Microsoft Office\BFFValidator>bffvalidator.exe "C:\users\Alice\Documents\Sample1_Failing.ppt"
    
  2. After the scan runs, the Command Prompt window displays the following failure message:

    BFFValidator: "C:\users\Alice\Documents\Sample1_Failing.ppt" FAILED at 01/25/11 10:44:35
    Log at: C:\users\Alice\Sample_document.ppt.bffvalidator.01-25-11_10-44-35.xml
    See: https://msdn.microsoft.com/en-us/library/dd945780(v=office.12).aspx for more information
    

    The information about the command prompt includes a time-stamp when the scan finished the location of the log file that contains the detailed parse stack and offset information about the failure. It also includes a link to the Microsoft Open Specification section that describes the correct behavior for the failure point.

  3. At this point, locate the log file and examine it for the offset in the binary data where you expect to find the error. The following is the log file that is created by this validation scan.

    <BFFValidation path="C:\users\Alice\Documents\Sample1_Failing.ppt" datetime="01/25/11 10:44:35" result="FAILED">
      <ParseStack>
        <Type builtinType="Docfile" docName="MS-PPT" sectionTitle="Streams and Storages" msdnLink="https://msdn.microsoft.com/en-us/library/dd945780(v=office.12).aspx">
          <Info>Built-in type "Docfile": The root storage object of an OLE compound file. For more information, see https://msdn.microsoft.com/en-us/library/dd942138.aspx.</Info>
        </Type>
        <Type builtinType="Stream" docName="MS-PPT" sectionTitle="Streams and Storages" msdnLink="https://msdn.microsoft.com/en-us/library/dd945780(v=office.12).aspx" streamName="Current User" streamOffset="0" hexStreamOffset="0x0">
          <Info>Built-in type "Stream": Any stream object for OLE compound files. The entire file contents for other files.</Info>
        </Type>
        <Type docName="MS-PPT" sectionTitle="CurrentUserAtom" sectionNumber="2.3.2" msdnLink="https://msdn.microsoft.com/en-us/library/940D5700-E4D7-4FC0-AB48-FED5DBC48BC1" streamName="Current User" streamOffset="0" hexStreamOffset="0x0"/>
        <Type docName="MS-PPT" sectionTitle="DocumentContainer" sectionNumber="2.4.1" msdnLink="https://msdn.microsoft.com/en-us/library/6254C4D1-5217-4E16-B20D-C04DDCCE31C9" streamName="PowerPoint Document" streamOffset="0" hexStreamOffset="0x0"/>
        <Type docName="MS-PPT" sectionTitle="SlideListWithTextContainer" sectionNumber="2.4.14.3" msdnLink="https://msdn.microsoft.com/en-us/library/307E6D12-7304-47A8-ACBD-3E7B8041AD3C" streamName="PowerPoint Document" streamOffset="1182" hexStreamOffset="0x49e"/>
        <Type docName="MS-PPT" sectionTitle="SlideListWithTextSubContainerOrAtom" sectionNumber="2.4.14.4" msdnLink="https://msdn.microsoft.com/en-us/library/FC198575-E6FC-420F-8693-6714469EB710" streamName="PowerPoint Document" streamOffset="1190" hexStreamOffset="0x4a6"/>
        <Type docName="MS-PPT" sectionTitle="SlidePersistAtom" sectionNumber="2.4.14.5" msdnLink="https://msdn.microsoft.com/en-us/library/48DCE412-9692-4F93-AEB7-3D9FDD3A0A5A" streamName="PowerPoint Document" streamOffset="1190" hexStreamOffset="0x4a6"/>
        <Type docName="MS-PPT" sectionTitle="SlideContainer" sectionNumber="2.5.1" msdnLink="https://msdn.microsoft.com/en-us/library/4CAC0976-73D0-4AB3-A70B-E98B3CF1C312" streamName="PowerPoint Document" streamOffset="35647" hexStreamOffset="0x8b3f"/>
        <Type docName="MS-PPT" sectionTitle="SlideProgTagsContainer" sectionNumber="2.5.19" msdnLink="https://msdn.microsoft.com/en-us/library/C2263E42-180E-4249-BD93-A421EFD8719B" streamName="PowerPoint Document" streamOffset="37196" hexStreamOffset="0x914c"/>
        <Type docName="MS-PPT" sectionTitle="SlideProgTagsSubContainerOrAtom" sectionNumber="2.5.20" msdnLink="https://msdn.microsoft.com/en-us/library/2BC67516-D5AB-4D9D-8676-1A825C64C2A8" streamName="PowerPoint Document" streamOffset="37204" hexStreamOffset="0x9154"/>
        <Type docName="MS-PPT" sectionTitle="SlideProgBinaryTagContainer" sectionNumber="2.5.21" msdnLink="https://msdn.microsoft.com/en-us/library/E25EB8D2-627E-4104-B293-FC8ED82A098C" streamName="PowerPoint Document" streamOffset="37204" hexStreamOffset="0x9154"/>
        <Type docName="MS-PPT" sectionTitle="SlideProgBinaryTagSubContainerOrAtom" sectionNumber="2.5.22" msdnLink="https://msdn.microsoft.com/en-us/library/AC9AA6BD-3C15-49BD-81D9-5B8BFD966053" streamName="PowerPoint Document" streamOffset="37212" hexStreamOffset="0x915c"/>
        <Type docName="MS-PPT" sectionTitle="PP10SlideBinaryTagExtension" sectionNumber="2.5.24" msdnLink="https://msdn.microsoft.com/en-us/library/CCB82F60-E1AE-4379-B1E0-00909BB70B17" streamName="PowerPoint Document" streamOffset="37212" hexStreamOffset="0x915c"/>
        <Type docName="MS-PPT" sectionTitle="ExtTimeNodeContainer" sectionNumber="2.8.15" msdnLink="https://msdn.microsoft.com/en-us/library/83D39C58-0D30-46A4-BFFB-188D792CB5A7" streamName="PowerPoint Document" streamOffset="37256" hexStreamOffset="0x9188"/>
        <Type builtinType="BLOB" streamName="PowerPoint Document" streamOffset="37325" hexStreamOffset="0x91cd">
          <Info>Built-in type "BLOB": Binary data of any length with no further definition. The size of the data can be zero.</Info>
        </Type>
        <Type docName="MS-PPT" sectionTitle="ExtTimeNodeContainer" sectionNumber="2.8.15" msdnLink="https://msdn.microsoft.com/en-us/library/83D39C58-0D30-46A4-BFFB-188D792CB5A7" streamName="PowerPoint Document" streamOffset="37325" hexStreamOffset="0x91cd"/>
        <Type builtinType="BLOB" streamName="PowerPoint Document" streamOffset="104277" hexStreamOffset="0x19755">
          <Info>Built-in type "BLOB": Binary data of any length with no further definition. The size of the data can be zero.</Info>
        </Type>
        <Type docName="MS-PPT" sectionTitle="ExtTimeNodeContainer" sectionNumber="2.8.15" msdnLink="https://msdn.microsoft.com/en-us/library/83D39C58-0D30-46A4-BFFB-188D792CB5A7" streamName="PowerPoint Document" streamOffset="104277" hexStreamOffset="0x19755"/>
        <Type builtinType="BLOB" streamName="PowerPoint Document" streamOffset="104365" hexStreamOffset="0x197ad">
          <Info>Built-in type "BLOB": Binary data of any length with no further definition. The size of the data can be zero.</Info>
        </Type>
        <Type docName="MS-PPT" sectionTitle="ExtTimeNodeContainer" sectionNumber="2.8.15" msdnLink="https://msdn.microsoft.com/en-us/library/83D39C58-0D30-46A4-BFFB-188D792CB5A7" streamName="PowerPoint Document" streamOffset="104365" hexStreamOffset="0x197ad"/>
        <Type builtinType="BLOB" streamName="PowerPoint Document" streamOffset="104453" hexStreamOffset="0x19805">
          <Info>Built-in type "BLOB": Binary data of any length with no further definition. The size of the data can be zero.</Info>
        </Type>
        <Type docName="MS-PPT" sectionTitle="ExtTimeNodeContainer" sectionNumber="2.8.15" msdnLink="https://msdn.microsoft.com/en-us/library/83D39C58-0D30-46A4-BFFB-188D792CB5A7" streamName="PowerPoint Document" streamOffset="104453" hexStreamOffset="0x19805"/>
        <Type builtinType="BLOB" streamName="PowerPoint Document" streamOffset="104890" hexStreamOffset="0x199ba">
          <Info>Built-in type "BLOB": Binary data of any length with no further definition. The size of the data can be zero.</Info>
        </Type>
        <Type docName="MS-PPT" sectionTitle="ExtTimeNodeContainer" sectionNumber="2.8.15" msdnLink="https://msdn.microsoft.com/en-us/library/83D39C58-0D30-46A4-BFFB-188D792CB5A7" streamName="PowerPoint Document" streamOffset="104890" hexStreamOffset="0x199ba"/>
        <Type docName="MS-PPT" sectionTitle="TimeAnimateBehaviorContainer" sectionNumber="2.8.29" msdnLink="https://msdn.microsoft.com/en-us/library/BC65CD1C-14A7-4C0D-BC2D-192BAB64A713" streamName="PowerPoint Document" streamOffset="104946" hexStreamOffset="0x199f2"/>
        <Type docName="MS-PPT" sectionTitle="TimeBehaviorContainer" sectionNumber="2.8.34" msdnLink="https://msdn.microsoft.com/en-us/library/8D75CC5B-6F80-4B2E-980B-A521E2691E54" streamName="PowerPoint Document" streamOffset="105094" hexStreamOffset="0x19a86"/>
        <Type docName="MS-PPT" sectionTitle="TimePropertyList4TimeBehavior" sectionNumber="2.8.37" msdnLink="https://msdn.microsoft.com/en-us/library/51C4CC59-2D58-4AC9-8B25-4ABC1040780D" streamName="PowerPoint Document" streamOffset="105167" hexStreamOffset="0x19acf"/>
        <Type docName="MS-PPT" sectionTitle="TimeVariant4Behavior" sectionNumber="2.8.38" msdnLink="https://msdn.microsoft.com/en-us/library/758F2315-8EF7-4F1D-81DA-41ED4AB6683A" streamName="PowerPoint Document" streamOffset="105175" hexStreamOffset="0x19ad7"/>
        <Type docName="MS-PPT" sectionTitle="TimeOverride" sectionNumber="2.8.41" msdnLink="https://msdn.microsoft.com/en-us/library/20CE9A31-8EAA-4A42-B957-B9111BE76B2C" streamName="PowerPoint Document" streamOffset="105175" hexStreamOffset="0x19ad7"/>
        <Type builtinType="LONG" streamName="PowerPoint Document" streamOffset="105184" hexStreamOffset="0x19ae0" childId="3" hexChildId="0x3">
          <Info>Built-in type "LONG": Signed 4-byte integer.</Info>
        </Type>
      </ParseStack>
      <LastData><![CDATA[
    00 00 00 00 -- -- -- --  -- -- -- -- -- -- -- --  ....
    ]]></LastData>
    </BFFValidation>
    
  4. As you examine the log file, you find that the validation error is in the PowerPoint Document stream. The Microsoft Office Binary File Format Validator Beta gives an offset of 0x0 (in hexadecimal): streamName="PowerPoint Document" streamOffset="0" hexStreamOffset="0x0".

    The error information is found in the last entry with a stream offset 0x19ae0 and child ID value equal to 3. The msdnLink attribute gives us the specific section-link to the [MS-PPT] Open Specification that explains what the binary file content should follow.

    <Type docName="MS-PPT" sectionTitle="TimeOverride" sectionNumber="2.8.41" msdnLink="https://msdn.microsoft.com/en-us/library/20CE9A31-8EAA-4A42-B957-B9111BE76B2C" streamName="PowerPoint Document" streamOffset="105175" hexStreamOffset="0x19ad7"/>
        <Type builtinType="LONG" streamName="PowerPoint Document" streamOffset="105184" hexStreamOffset="0x19ae0" childId="3" hexChildId="0x3">
          <Info>Built-in type "LONG": Signed 4-byte integer.</Info>
        </Type>
      </ParseStack>
      <LastData><![CDATA[
    00 00 00 00 -- -- -- --  -- -- -- -- -- -- -- --  ....
    ]]></LastData>
    </BFFValidation>
    
  5. Now that you have the section link and child ID, find what the binary data should look like. To do so, examine the data in [MS-PPT] and compare the data to the last four read bytes reported by Microsoft Office Binary File Format Validator Beta in the LastData element.

  6. The msdnLink attribute in the second to last Type element, msdnLink = https://msdn.microsoft.com/en-us/library/20CE9A31-8EAA-4A42-B957-B9111BE76B2C, points to the following data on [MS-PPT].

    TimeOverride Record

  7. The childId attribute of 3 points us to the third record in the TimeOverride structure and that is the override record.

    The LastData element indicates that the last 4 bytes that were read by Microsoft Office Binary File Format Validator Beta are 00 00 00 00. However, in the Open Specification you can see that the value for this record MUST be 1.

  8. You are producing the .ppt file programmatically. You want to fix the error in the application code that generated the file. However, to make sure this is the only validation error in the file, it is also useful to fix the value in a hex-editor. You can use any hexadecimal editor, but preferably one that shows the streams. To fix the value, follow these steps:

    1. Open the file in the hexadecimal editor and then locate the stream "PowerPoint Document". In the log file, this stream is assumed to start at offset 0x0, but if a different editor gives a different offset to the start of the stream, then you would need to add that value to the offset reported for the override record.

    2. Locate the last 4 bytes in the hexadecimal editor by going to the offset recorded in the log file (0x19ae0).

    3. After you have located the bytes, you must flip them before editing, because the file format for PowerPoint is in little-endian order.

    4. Because the last 4 bytes are 00 00 00 00, for illustration purposes, to flip each byte would be similar to flipping 12 34 to 34 12.

      To fix the Sample1_Failing.ppt document, insert 01 00 00 00 into the offset position 0x19ae0 and then save the file as "Sample1_Repaired.ppt" in the hexadecimal editor.

  9. Re-run the Microsoft Office Binary File Format Validator Beta on the file that you fixed to verify that there are no additional file format validation errors.

    C:\Program Files\Microsoft Office\BFFValidator>bffvalidator.exe "C:\users\Alice\Documents\ Sample1_Repaired.ppt"
    

    Because the issue that you just fixed was the only issue, the file now passes validation:

    PASSED at 01/25/11 11:40:12
    Log at: C:\users\Alice\ Sample1_Repaired.ppt.bffvalidator.01-25-11_11-40-12.xml
    

    Now when the file is opened, it opens normally:

    PowerPoint file in normal view

Finding a validation error in a .doc file that fails validation

  1. A Word file that is named Sample1_Failing.doc opens in normal mode but it still contains a validation error.

    Word normal view

  2. To validate this file, follow steps similar to the procedure in the previous section to produce this log file:

    <BFFValidation path="C:\users\Alice\Failing\Sample1_Failing.doc" datetime="01/25/11 09:06:44" result="FAILED">
      <ParseStack>
        <Type builtinType="Docfile" docName="MS-DOC" sectionTitle="File Structure" msdnLink="https://msdn.microsoft.com/en-us/library/4eaddc8f-4abd-43bb-8fd4-aef9c6121737">
          <Info>Built-in type "Docfile": The root storage object of an OLE compound file. For more information, see https://msdn.microsoft.com/en-us/library/dd942138.aspx.</Info>
        </Type>
        <Type builtinType="Stream" docName="MS-DOC" sectionTitle="File Structure" msdnLink="https://msdn.microsoft.com/en-us/library/4eaddc8f-4abd-43bb-8fd4-aef9c6121737" streamName="WordDocument" streamOffset="0" hexStreamOffset="0x0">
          <Info>Built-in type "Stream": Any stream object for OLE compound files. The entire file contents for other files.</Info>
        </Type>
        <Type docName="MS-DOC" sectionTitle="Fib" sectionNumber="2.5.1" msdnLink="https://msdn.microsoft.com/en-us/library/9AEAA2E7-4A45-468E-AB13-3F6193EB9394" streamName="WordDocument" streamOffset="0" hexStreamOffset="0x0"/>
        <Type docName="MS-DOC" sectionTitle="FibRgFcLcb" sectionNumber="2.5.5" msdnLink="https://msdn.microsoft.com/en-us/library/175D2FE1-92DD-45D2-B091-1FE8A0C0D40A" streamName="WordDocument" streamOffset="154" hexStreamOffset="0x9a"/>
        <Type docName="MS-DOC" sectionTitle="FibRgFcLcb2003" sectionNumber="2.5.9" msdnLink="https://msdn.microsoft.com/en-us/library/F6B7D624-570C-4057-ACFD-CBA71D12F1A0" streamName="WordDocument" streamOffset="154" hexStreamOffset="0x9a"/>
        <Type docName="MS-DOC" sectionTitle="FibRgFcLcb2002" sectionNumber="2.5.8" msdnLink="https://msdn.microsoft.com/en-us/library/FCE09F81-704B-460D-9BCA-F7DC121AED66" streamName="WordDocument" streamOffset="154" hexStreamOffset="0x9a"/>
        <Type docName="MS-DOC" sectionTitle="FibRgFcLcb2000" sectionNumber="2.5.7" msdnLink="https://msdn.microsoft.com/en-us/library/265BCA68-C4EF-4A03-8517-61D7E79850EB" streamName="WordDocument" streamOffset="154" hexStreamOffset="0x9a"/>
        <Type docName="MS-DOC" sectionTitle="FibRgFcLcb97" sectionNumber="2.5.6" msdnLink="https://msdn.microsoft.com/en-us/library/0C9DF81F-98D0-454E-AD84-B612CD05B1A4" streamName="WordDocument" streamOffset="154" hexStreamOffset="0x9a"/>
        <Type docName="MS-DOC" sectionTitle="PlcBtePapx" sectionNumber="2.8.6" msdnLink="https://msdn.microsoft.com/en-us/library/76D3B8E1-337B-4812-A3F1-6B417BA6398D" streamName="1Table" streamOffset="1172" hexStreamOffset="0x494"/>
        <Type docName="MS-DOC" sectionTitle="PnFkpPapx" sectionNumber="2.9.207" msdnLink="https://msdn.microsoft.com/en-us/library/6B3D10C0-0B95-4533-93FE-CAEF5C09679B" streamName="1Table" streamOffset="1180" hexStreamOffset="0x49c"/>
        <Type docName="MS-DOC" sectionTitle="PapxFkp" sectionNumber="2.9.174" msdnLink="https://msdn.microsoft.com/en-us/library/34AAEAF3-9578-41AF-A3F5-C12F6F66BF1B" streamName="WordDocument" streamOffset="3072" hexStreamOffset="0xc00"/>
        <Type docName="MS-DOC" sectionTitle="BxPap" sectionNumber="2.9.23" msdnLink="https://msdn.microsoft.com/en-us/library/86DF4678-FF4D-4877-B61A-6C621906973F" streamName="WordDocument" streamOffset="3101" hexStreamOffset="0xc1d"/>
        <Type docName="MS-DOC" sectionTitle="PapxInFkp" sectionNumber="2.9.175" msdnLink="https://msdn.microsoft.com/en-us/library/580510B8-DF7A-467E-A51C-0D71EB15C7CD" streamName="WordDocument" streamOffset="3362" hexStreamOffset="0xd22"/>
        <Type builtinType="BLOB" streamName="WordDocument" streamOffset="3363" hexStreamOffset="0xd23">
          <Info>Built-in type "BLOB": Binary data of any length with no further definition. The size of the data can be zero.</Info>
        </Type>
        <Type docName="MS-DOC" sectionTitle="GrpPrlAndIstd" sectionNumber="2.9.114" msdnLink="https://msdn.microsoft.com/en-us/library/BD96F2AA-1318-4066-9723-4DB035EF412B" streamName="WordDocument" streamOffset="3363" hexStreamOffset="0xd23"/>
        <Type docName="MS-DOC" sectionTitle="GrpPrlAndIstd" msdnLink="https://msdn.microsoft.com/en-us/library/dd909647(office.12).aspx" streamName="WordDocument" streamOffset="3365" hexStreamOffset="0xd25"/>
        <Type docName="MS-DOC" sectionTitle="Prl" sectionNumber="2.2.5.2" msdnLink="https://msdn.microsoft.com/en-us/library/4EABFFA2-B8B6-444C-9A92-3291AB5035EF" streamName="WordDocument" streamOffset="3387" hexStreamOffset="0xd3b"/>
        <Type docName="MS-DOC" sectionTitle="Table Properties" msdnLink="https://msdn.microsoft.com/en-us/library/b39a6648-501c-4361-8366-4f042f579469" streamName="WordDocument" streamOffset="3389" hexStreamOffset="0xd3d"/>
        <Type docName="MS-DOC" sectionTitle="TableBordersOperand80" sectionNumber="2.9.303" msdnLink="https://msdn.microsoft.com/en-us/library/E334B793-1C10-4FED-8FAC-69C3F8FB41B6" streamName="WordDocument" streamOffset="3389" hexStreamOffset="0xd3d"/>
        <Type docName="MS-DOC" sectionTitle="Brc80MayBeNil" sectionNumber="2.9.18" msdnLink="https://msdn.microsoft.com/en-us/library/8458EDBD-C81C-4EC7-B5FF-C99C50575301" streamName="WordDocument" streamOffset="3390" hexStreamOffset="0xd3e"/>
        <Type docName="MS-DOC" sectionTitle="Brc80" sectionNumber="2.9.17" msdnLink="https://msdn.microsoft.com/en-us/library/CFAB8014-E477-4E33-B50F-A23B8476F6F3" streamName="WordDocument" streamOffset="3390" hexStreamOffset="0xd3e"/>
        <Type builtinType="BYTE" streamName="WordDocument" bitfield="True" bitOffsetWithinStruct="31" hexBitOffsetWithinStruct="0x1f" bitCount="1" streamOffsetOfStruct="3390" hexStreamOffsetOfStruct="0xd3e" streamOffset="3393" hexStreamOffset="0xd41" childId="7" hexChildId="0x7">
          <Info>Built-in type "BYTE": Unsigned 1-byte integer.</Info>
        </Type>
      </ParseStack>
      <LastData><![CDATA[
    18 0B 00 80 -- -- -- --  -- -- -- -- -- -- -- --  ....
    ]]></LastData>
    </BFFValidation>
    
  3. In this log file, the stream name is WordDocument, the child ID is 7, the offset it 0xd41, and the last read data is 18 0B 00 80.

  4. The msdnLink attribute in the second-to-last Type element, msdnLink="https://msdn.microsoft.com/en-us/library/CFAB8014-E477-4E33-B50F-A23B8476F6F3", points to the following data on [MS-DOC].

    Brc80 structure

  5. In the Brc80 structure, the 7th record is C, one bit in size, and the specification states that C MUST be zero.

  6. After you locate the last read 4 bytes in the file by using a hexadecimal editor, flip the bytes again because the specification is in little-endian order, to 80 00 0B 18.

    Because C++ stored structures from the end, then dptSpace (5 bits), A-fShadow ( 1 bit), B-fFrame (1 bit) and C- reserved (1 bit) are all stored in the first 80.

    If you convert the 80 into binary it becomes 1000 0000. And again, the dptSpace 5 bits are the first 5 from the end so that the C-reserved is the first bit, 1 here.

  7. After you fix this to 0000 0000 as is required by the specification, the hexadecimal becomes 00, so that the bytes now read 00 00 0B 18.

  8. To fix this in the file, you reverse it again to little-endian order so that the last read bytes in the file now read 18 0B 00 00.

Conclusion

You have two examples of how to troubleshoot binary files that comply with the Microsoft Office protocols. How you fix the application code that generates the file is a detail better left to the reader. For simple, one-off fixes, opening the file by using your favorite hexadecimal editor to examine and change bits is a best practice in troubleshooting your code. The examples show how to use the Microsoft Office Binary File Format Validator Beta to examine the files that your applications create, and outlines a process for examining the binary data in the file so that you can improve the code that reads and writes a .doc, .xls or .ppt file.

See Also

Other Resources

Plan Office File Validation settings for Office 2010