Export (0) Print
Expand All
Expand Minimize
3 out of 3 rated this helpful - Rate this topic

Generating a Word 2007 Document by Using PowerTools for Open XML and Windows PowerShell

Office 2007

Summary: Learn how to use PowerTools for Open XML and Microsoft Windows PowerShell to create a Microsoft Office Word 2007 document that contains property data from other Word 2007 documents. (16 printed pages)

Office Visual How To

Applies to: 2007 Microsoft Office System, Microsoft Office Word 2007, Microsoft Windows PowerShell, PowerTools for Open XML

Joel Krist, iSoftStone

October 2009

Overview

PowerTools for Open XML is an open source project on CodePlex that you can use alongside Microsoft Windows Powershell to create and process Open XML documents.

This visual how-to topic provides a PowerShell script that uses the following cmdlets from PowerTools for Open XML to create a Microsoft Office Word 2007 document that contains a table with values from the Title, Creator, and Description document properties of other Word 2007 documents:

  • Add-OpenXmlContent

  • Export-OpenXmlWordprocessing

  • Get-OpenXmlStyle

  • Set-OpenXmlStyle

  • Set-OpenXmlContentStyle

See It

Video splash screen

Watch the Video

Length: 10:58 | Size: 14.90 MB | Type: WMV file

Code It | Read It | Explore It

Code It

Download the sample code

This visual how-to topic includes a PowerShell script that creates a Word 2007 document. The new document contains a table that has the Title, Creator, and Description document property values from other Word 2007 documents that reside in the same folder as the script. In addition, it has a heading that is formatted as a Heading 1 style, followed by descriptive text under a heading that is formatted as a Heading 2 style. The following figure shows a typical document that the script might generate from three source documents.

Figure 1. Document Created by PowerShell Script

Document Created by PowerShell Script

To illustrate how to create the PowerShell script, this section walks through the following steps:

  1. Installing PowerTools for Open XML.

  2. Creating the template files used by the PowerShell script.

  3. Creating and testing the PowerShell script.

Installing PowerTools for Open XML

PowerTools for Open XML is an open source project on CodePlex that provides sample source code, sample scripts, and guidance to help you build PowerShell cmdlets that create and modify Open XML documents.

PowerTools for Open XML requires the following software:

  • The Open XML Software Development Kit (SDK) for Microsoft Office. PowerTools for Open XML is built on top of the Open XML SDK, version 1.0.

  • Windows PowerShell. You must have Windows PowerShell installed to run PowerShell scripts that use the PowerTools for Open XML cmdlets or to execute the cmdlets interactively.

  • Microsoft .NET Framework 3.5

  • Visual Studio 2008 (optional)

    PowerTools for Open XML provides a Visual Studio solution you can use to build the class library that contains the PowerShell cmdlet functionality. You can use any edition of Visual Studio 2008 to build the solution; for example, Visual C# 2008 Express Edition.

    Staff DotNet provides PowerTools for Open XML as pre-built binaries that are . For more information about how to install and use the Open XML Power Tools binaries, see this Staff DotNet blog post.

The following procedure details how to download, build, and install PowerTools for Open XML.

Downloading, Building, and Installing PowerTools for Open XML

  1. Verify that you have the prerequisite software installed.

  2. Open the PowerTools for Open XML project Downloads page on CodePlex, download the PowerTools .zip file, and then save it to a folder on a local drive.

    Figure 2. Downloading PowerTools for Open XML

    Downloading PowerTools
  3. Open the folder where you saved the .zip file and extract the files.

  4. Start Visual Studio 2008 and open the OpenXml Power Tools.sln solution. If you are on Windows Vista, you must run Visual Studio as an administrator because the Visual Studio PowerTools solution automatically installs the PowerTools into PowerShell, which requires administrative privileges.

    Figure 3. Running Visual Studio as Administrator

    Running Visual Studio
  5. In Visual Studio, press CTRL+SHIFT+B to build the solution. Visual Studio builds the PowerTools for Open XML class library and installs it into PowerShell.

  6. Verify that Visual Studio successfully added PowerTools for Open XML to PowerShell by opening a PowerShell window and typing the following command.

    get-pssnapin -registered

    A successful registration displays the following screen.

    Figure 4. Verifying PowerTools Installation

    Verifying PowerTools Installation
Ee663892.note(en-us,office.12).gifNote:

To watch a screen-cast that shows how to install, build, and use PowerTools for Open XML, see this blog post by Eric White.

Creating the Template Files that the PowerShell Script Uses

The PowerShell script in this visual how-to article uses the following template files when it creates the new Word 2007 document:

  • Template.docx

  • Table.xml

  • TableRow.xml

The following section describes each of the template files and how to create them.

Ee663892.Important(en-us,office.12).gifImportant:

For the sake of simplicity, all of the template files and all of the source Word documents that the code uses to create the new document are assumed to be in the same folder as the PowerShell script. That folder will be referred to as the script folder for the remainder of this article.

The PowerShell script uses Template.docx to help format paragraphs in the new document with the Heading 1 and Heading 2 styles. To create Template.docx, use the following procedure.

Template.docx

  1. Start Microsoft Word 2007 and add text to a new document that is formatted using the Heading 1 and Heading 2 syles. The actual text is not important. The following figure shows an example of text formatted with the Heading 1 and Heading 2 styles.

    Figure 5. Template.docs

    Template.docx
  2. Save the new document to the script folder with the name Template.docx.

The PowerShell script uses the WordProcessingML markup in Table.xml to help inject the document properties table into the new document. To create Table.xml, use the following procedure.

Table.xml

  1. Create a new file in any text editor and add the following WordProcessingML fragment to it.

    <w:tbl>
        <w:tblPr>
            <w:tblStyle w:val="LightList-Accent1"/>
            <w:tblW w:w="5000" w:type="pct"/>
            <w:tblLook w:val="04A0"/>
        </w:tblPr>
        <w:tblGrid>
            <w:gridCol/>
            <w:gridCol/>
            <w:gridCol/>
        </w:tblGrid>
        <w:tr>
            <w:tc>
                <w:p>
                    <w:r>
                        <w:t>Title</w:t>
                    </w:r>
                </w:p>
            </w:tc>
            <w:tc>
                <w:p>
                    <w:r>
                        <w:t>Creator</w:t>
                    </w:r>
                </w:p>
            </w:tc>
            <w:tc>
                <w:p>
                    <w:r>
                        <w:t>Description</w:t>
                    </w:r>
                </w:p>
            </w:tc>
        </w:tr>
    </w:tbl>
    
    

The PowerShell script uses the WordProcessingML markup in TableRow.xml to help add rows to the document properties table for each source document. To create TableRow.xml, use the following procedure.

TableRow.xml

  1. Create a new file in any text editor and add the following WordProcessingML fragment to it.

    w:tr>
        <w:tc>
            <w:p>
                <w:r>
                    <w:t>Title</w:t>
                </w:r>
            </w:p>
        </w:tc>
        <w:tc>
            <w:p>
                <w:r>
                    <w:t>Creator</w:t>
                </w:r>
            </w:p>
        </w:tc>
        <w:tc>
            <w:p>
                <w:r>
                    <w:t>Description</w:t>
                </w:r>
            </w:p>
        </w:tc>
    </w:tr>
    
    
  2. Save the text file to the script folder with the name TableRow.xml.

Creating and Testing the PowerShell Script

Create a new file in any text editor, add the following PowerShell script to it, and then save the file to the script folder with the name ProcessDocProps.ps1.

# arg0 = Document to modify
# arg1 = Text to insert
function AddContent
{
  # Add the content after the last paragraph in the document.
  $stringToAdd = "<w:p><w:r><w:t>" + $args[1] + "</w:t></w:r></w:p>"
  Add-OpenXmlContent -Path $args[0] -PartPath '/word/document.xml' `
    -InsertionPoint '/w:document/w:body/w:p[last()]' `
    -Content $stringToAdd -SuppressBackups
}

# arg0 = Document to modify
function AddSummaryTable
{
  # Read the <w:tbl>...</w:tbl> fragment from the Table.xml file.
  $stringToAdd = [string]::join("`n", (gc -read 10kb Table.xml))
  
  # Add the table after the last paragraph in the document.
  Add-OpenXmlContent -Path $args[0] -PartPath '/word/document.xml' `
    -InsertionPoint '/w:document/w:body/w:p[last()]' `
    -Content $stringToAdd -SuppressBackups
}

# arg0 = Document to modify
# arg1 = Title
# arg2 = Creator
# arg3 = Description
function AddSummaryTableRow
{
  # Read the <w:tr>...</w:tr> fragment from the TableRow.xml file.
  $stringToAdd = [string]::join("`n", (gc -read 10kb TableRow.xml))
  
  # Replace the placeholders in the fragment with actual values.
  $stringToAdd = $stringToAdd.Replace("Title", $args[1])
  $stringToAdd = $stringToAdd.Replace("Creator", $args[2])
  $stringToAdd = $stringToAdd.Replace("Description", $args[3])

  # Add the table row after the last table row in the document.
  Add-OpenXmlContent -Path $args[0] -PartPath '/word/document.xml' `
    -InsertionPoint '/w:document/w:body/w:tbl/w:tr[last()]' `
    -Content $stringToAdd -SuppressBackups
}

Export-OpenXmlWordprocessing `
  -Text "Document Titles, Creators, and Descriptions" `
  -OutputPath ./SummaryDocument.docx

Get-OpenXmlStyle -Path ./Template.docx | `
  Set-OpenXmlStyle -Path ./SummaryDocument.docx -SuppressBackups

Get-OpenXmlStyle -Path ./Template.docx | `
  Set-OpenXmlContentStyle -Path ./SummaryDocument.docx `
    -InsertionPoint '/w:document/w:body/w:p[1]' `
    -StyleName 'Heading1' -SuppressBackups

AddContent ./SummaryDocument.docx "Overview"

Get-OpenXmlStyle -Path ./Template.docx | `
  Set-OpenXmlContentStyle -Path ./SummaryDocument.docx `
    -InsertionPoint '/w:document/w:body/w:p[2]' `
    -StyleName 'Heading2' -SuppressBackups

AddContent ./SummaryDocument.docx `
  ("This report lists the Title, Creator, and Description " + `
  "properties of the documents in the current directory.")
   
AddSummaryTable ./SummaryDocument.docx  

$sourceDocs = `
  get-openxmldocument "Test*.docx" -SuppressBackups -Readonly

foreach ($sourceDoc in $sourceDocs)
{
    $stream = $sourceDoc.CoreFilePropertiesPart.GetStream()
    
    $docPropsXml = `
      [xml]((new-object System.IO.StreamReader($stream)).ReadToEnd())
    
    $sourceDoc.Close()
    
    AddSummaryTableRow ./SummaryDocument.docx `
      $docPropsXml.coreProperties.Title.ToString() `
      $docPropsXml.coreProperties.Creator.ToString() `
      $docPropsXml.coreProperties.Description.ToString()
}

For the sake of simplicity and illustration, the PowerShell script searches for documents in the current folder whose name matches the form Test*.docx when it determines the source documents from which to pull the Title, Creator, and Description document properties.

To test the script, create at least one Word 2007 document in the script folder, give it a name that matches the form Test*.docx, and then set the Title, Creator, and Description properties of the document. You can access the document information panel that displays the document properties many different ways. The following procedure is one of those ways.

Accessing Document Properties

  1. Create a new document in Microsoft Word 2007; whether the document contains any content is irrelevant here.

  2. Click the Word Office button and then click Prepare | Properties.

    Figure 6. Accessing Document Properties

    Accessing Document Properties
  3. Word displays the document information panel. Set the values of the Author, Title, and Comments properties.

    Figure 7. Setting Document Properties

    Setting Document Properties

     

    Ee663892.note(en-us,office.12).gifNote:

    The Author and Comments properties that are exposed via the document information panel map to the Creator and Description document properties in WordProcessingML.

  4. Save the document to the script folder as a Word Document (*.docx) with a name that uses the form, Test*.docx; for example, TestDocument1.docx.

  5. Close the document.

Use the following steps to test the PowerShell script:

Ee663892.note(en-us,office.12).gifNote:

The following steps include setting the Windows PowerShell execution policy to allow scripts to be run, which requires running with administrative privileges. Thus, you must run Windows PowerShell as an administrator.

Testing the PowerShell Script

  1. Verify that you are running Windows PowerShell as an administrator.

    Figure 8. Running Windows PowerShell as Administrator

    Running Windows PowerShell
  2. Type the following command at the PowerShell prompt to load PowerTools for Open XML into the current PowerShell session.

    add-pssnapin -name openxml.powertools

  3. Type the following command at the PowerShell prompt to set the PowerShell session execution policy to allow scripts to run.

    set-executionpolicy unrestricted

  4. In the PowerShell session, navigate to the script folder that contains the PowerShell script, template files, and source documents that you created earlier.

  5. Run the PowerShell script by typing its name with a relative path at the PowerShell prompt.

    .\processdocprops.ps1

    Figure 9. Running the Windows PowerShell Script

    PowerShell Script
  6. The script runs and generates a document in the script folder named SummaryDocument.docx. Open SummaryDocument.docx in Word 2007 to verify that the Title, Creator, and Description document properties of the Test*.docx documents located in the script folder were processed correctly.

Read It

This section uses code snippets from the Code It section to describe how the PowerShell script creates a Word 2007 document that contains a table with values from the Title, Creator, and Description document properties of other Word 2007 documents.

To begin, the script defines three helper functions – AddContent, AddSummaryTable, and AddSummaryTableRow - that help to inject the text and the table into the summary document.

The AddContent function accepts two arguments - the document to which to add the text and the text to add. It uses the Add-OpenXmlContent cmdlet to insert a new paragraph into the document after the last paragraph.

# arg0 = Document to modify
# arg1 = Text to insert
function AddContent
{
  # Add the content after the last paragraph in the document.
  $stringToAdd = "<w:p><w:r><w:t>" + $args[1] + "</w:t></w:r></w:p>"
  Add-OpenXmlContent -Path $args[0] -PartPath '/word/document.xml' `
    -InsertionPoint '/w:document/w:body/w:p[last()]' `
    -Content $stringToAdd -SuppressBackups
}

The AddSummaryTable function accepts one argument - the document to which to add the table. It uses the Add-OpenXmlContent cmdlet to insert a new table into the document after the last paragraph. The code reads the WordProcessingML fragment (in Table.xml) that defines the table by using the Get-Content (gc) cmdlet to read the contents of the file. The [string]::Join() method merges the collection of strings returned by the gc cmdlet into a single string that can be passed to the Add-OpenXmlContent cmdlet.

# arg0 = Document to modify
function AddSummaryTable
{
  # Read the <w:tbl>...</w:tbl> fragment from the Table.xml file.
  $stringToAdd = [string]::join("`n", (gc -read 10kb Table.xml))
  
  # Add the table after the last paragraph in the document.
  Add-OpenXmlContent -Path $args[0] -PartPath '/word/document.xml' `
    -InsertionPoint '/w:document/w:body/w:p[last()]' `
    -Content $stringToAdd -SuppressBackups
}

The AddSummaryTableRow function accepts these arguments - the document that contains the table to which to add the row and the Title, Creator, and Description property values of the source document. The code reads the WordProcessingML fragment that defines the table row from the TableRow.xml file by using the Get-Content (gc) cmdlet to read the file's contents. The [string]::Join() method concatenates the collection of strings that is returned by the gc cmdlet into a single string. The AddSummaryTableRow function replaces the placeholder values in the markup that is read from the TableRow.xml file with the provided document property values, and then uses the Add-OpenXmlContent cmdlet to insert the new table row into the document after the last table row.

# arg0 = Document to modify
# arg1 = Title
# arg2 = Creator
# arg3 = Description
function AddSummaryTableRow
{
  # Read the <w:tr>...</w:tr> fragment from the TableRow.xml file.
  $stringToAdd = [string]::join("`n", (gc -read 10kb TableRow.xml))
  
  # Replace the placeholders in the fragment with actual values.
  $stringToAdd = $stringToAdd.Replace("Title", $args[1])
  $stringToAdd = $stringToAdd.Replace("Creator", $args[2])
  $stringToAdd = $stringToAdd.Replace("Description", $args[3])

  # Add the table row after the last table row in the document.
  Add-OpenXmlContent -Path $args[0] -PartPath '/word/document.xml' `
    -InsertionPoint '/w:document/w:body/w:tbl/w:tr[last()]' `
    -Content $stringToAdd -SuppressBackups
}

After it defines the helper functions, the script begins to create the summary document. First, it uses the Export-OpenXmlWordprocessing cmdlet to create a new Word 2007 document in the current folder named SummaryDocument.docx. The document contains a single line of text that is passed with the -Text parameter.

Export-OpenXmlWordprocessing `
  -Text "Document Titles, Creators, and Descriptions" `
  -OutputPath ./SummaryDocument.docx

Then, the script uses the Get-OpenXmlStyle cmdlet to load the style part from the Template.docx file that is in the current folder and pipes the style part to the Set-OpenXmlStyle cmdlet. This creates the style part that contains the definition of the Header 1 and Header 2 styles in the new SummaryDocument.docx file.

Get-OpenXmlStyle -Path ./Template.docx | `
  Set-OpenXmlStyle -Path ./SummaryDocument.docx -SuppressBackups

From there, the script uses a combination of the Get-OpenXmlStyle and Set-OpenXmlStyle cmdlets to set the formatting of the first paragraph in the summary document to the Heading 1 style.

Get-OpenXmlStyle -Path ./Template.docx | `
  Set-OpenXmlContentStyle -Path ./SummaryDocument.docx `
    -InsertionPoint '/w:document/w:body/w:p[1]' `
    -StyleName 'Heading1' -SuppressBackups

Now, the script uses the AddContent function to add a new paragraph to the document and then again uses the Get-OpenXmlStyle and Set-OpenXmlStyle cmdlets to set the style of the new paragraph; this time to Heading 2.

AddContent ./SummaryDocument.docx "Overview"

Get-OpenXmlStyle -Path ./Template.docx | `
  Set-OpenXmlContentStyle -Path ./SummaryDocument.docx `
    -InsertionPoint '/w:document/w:body/w:p[2]' `
    -StyleName 'Heading2' -SuppressBackups

To add a new paragraph, the script uses the AddContent function; to add a table to the document, it uses the AddSummaryTable function.

AddContent ./SummaryDocument.docx `
  ("This report lists the Title, Creator, and Description " + `
  "properties of the documents in the current directory.")
   
AddSummaryTable ./SummaryDocument.docx  

Finally, the script processes the source documents in the current folder. It uses the Get-OpenXmlDocument cmdlet to get the collection of documents in the current folder whose names are of the form Test*.docx. It then iterates through the collection of documents, getting a stream on the current document's core file properties part. It creates a System.IO.StreamReader object on the stream and reads the contents of the core file properties part into an XML document by using the PowerShell [xml] cast. The script then takes advantage of the capability in PowerShell to expose the elements of the XML document as though they were properties on an object, and gets the values of the Title, Creator, and Description document properties. The property values are then passed to the AddSummaryTableRow function to add a new row to the table with the current document property values.

$sourceDocs = `
  get-openxmldocument "Test*.docx" -SuppressBackups -Readonly

foreach ($sourceDoc in $sourceDocs)
{
$stream = $sourceDoc.CoreFilePropertiesPart.GetStream()

$docPropsXml = `
  [xml]((new-object System.IO.StreamReader($stream)).ReadToEnd())

$sourceDoc.Close()

AddSummaryTableRow ./SummaryDocument.docx `
  $docPropsXml.coreProperties.Title.ToString() `
  $docPropsXml.coreProperties.Creator.ToString() `
  $docPropsXml.coreProperties.Description.ToString()
}

Explore It

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft. All rights reserved.