Using PHP to Create a Word 2007 Document

Summary:  Learn how to use PHP to transform XML data into the Open XML format. (10 printed Pages)

Office Visual How To

Applies to:  2007 Microsoft Office system, Microsoft Office Word 2007, Microsoft Open XML SDK 2.0, PHP version 5.3.0

Michael Case, iSoftStone

December 2009

Overview

You can use XSL Transformations (XSLT) to transform XML data into the Microsoft Office Open XML SDK 2.0 format that is used by Microsoft Office Word 2007, and make new Word 2007 documents from XML data. You can simplify transforming XML data into a Word 2007 document by starting with an existing Word 2007 document that has the desired layout.

This Visual How To shows how to use XSLT and PHP to create a new Word 2007 document. It shows how you create an XSL Transform file that is based on an existing Word 2007 document. The code sample shows how to use the XSL Transform file to create a Word 2007 document based on data that is stored in an XML file.

This Visual How To uses PHP version 5.3.0 which you must install in order to follow the steps provided here.

See It Watch the Video

Watch the Video

Length: 07:33 | Size: 8.30 MB | Type: WMV file

Code It | Read It | Explore It

Code It

Download the Sample Code

The solution presented in this Visual How To shows how to create a PHP script that you run in the Windows Command Prompt that takes an XML data file, an XSL Transform file, and a Word 2007 document template and uses XSLT to create a well formatted Word 2007 document that contains the data from the XML data file.

This section shows how to create the solution:

  1. Creating an XML data file.

  2. Creating a Word 2007 template document that has the desired layout.

  3. Extracting the contents of the main document part of the Word 2007 template document.

  4. Creating an XSL Transform file that is based on the extracted content.

  5. Adding Transforms to the XSL Transform file.

  6. Configuring PHP to enable the XSL extension.

  7. Creating a PHP script using Notepad.

  8. Running the PHP script from the Windows Command Prompt.

Creating an XML Data File

The sample code for this Visual How To assumes that a personal movie library XML data file that is named MyMovies.xml exists in the C:\Temp directory.

To create the MyMovies.xml data source

  1. Start Visual Studio 2008.

  2. From the File menu, point to New and then click File.

  3. In the New File dialog box, select XML File.

  4. Copy the following XML example into the new file.

    <?xml version="1.0" encoding="UTF-8"?>
    <Movies>
      <Genre name="Action">
        <Movie>
          <Name>Crash</Name>
          <Released>2005</Released>
        </Movie>
      </Genre>
      <Genre name="Drama">
        <Movie>
          <Name>The Departed</Name>
          <Released>2006</Released>
        </Movie>
        <Movie>
          <Name>The Pursuit of Happyness</Name>
          <Released>2006</Released>
        </Movie>
      </Genre>
      <Genre name="Comedy">
        <Movie>
          <Name>The Bucket List</Name>
          <Released>2007</Released>
        </Movie>
      </Genre>
    </Movies>
    
  5. Save the document as C:\Temp\MyMovies.xml.

Creating a Word 2007 Template Document

The sample code uses an existing Word 2007 document to help simplify creating the XSL Transform file. The code assumes that a Word 2007 document named MyMoviesTemplate.docx is in the C:\Temp directory.

To create the MyMoviesTemplate.docx document

  1. Start Word 2007.

  2. Add text to the document and placeholders for the XML data following the layout in the example. The placeholders in the document shown in this section are Genre Name, Movie Title, and year. These are replaced with data pulled from the XML data file that was created previously when the XSL transform is performed.

    Figure 1. Word 2007 Template Document

    Word 2007 Template Document

  3. Save the document as C:\Temp\MyMoviesTemplate.docx.

Extracting the Contents of the Main Document Part of the Template Document

To simplify creating the XSL Transform file, you can use the contents of the Word 2007 template document that you created in the previous step as a starting point. Word 2007 documents use the Open Packaging Conventions (OPC). This means that Word documents are really .zip files that contain XML, binary, and other kinds of files. By adding the .zip extension to the end of the Word 2007 document name, you can then use tools such as WinZip or Windows Explorer to examine and extract the contents of the document.

To create the XSL Transform file, the Word 2007 template document that you created earlier is temporarily renamed with a .zip extension. The document.xml file is then extracted from the template document to the C:\Temp directory, renamed to MyMovies.xslt and converted from XML to XSLT.

To extract the contents of the main document part of the template document

  1. Start Windows Explorer and navigate to the folder that contains the MyMoviesTemplate.docx file that you created earlier.

  2. Rename MyMoviesTemplate.docx to MyMoviesTemplate.docx.zip to gain access to the underlying Open XML files.

  3. Use Windows Explorer to explore the contents of the MyMoviesTemplate.docx ZIP file. Navigate to the Word folder in the .zip file and copy the document.xml file to the C:\Temp folder and rename the file to MyMovies.xslt.

    Figure 2. Copying the Document.XML File

    Copying the Document.XML File

  4. Rename MyMoviesTemplate.docx.zip back to MyMoviesTemplate.docx.

Creating an XSL Transform File Based on the Extracted Content

The next step is to convert the MyMovies.xslt file to an XSL Transform file.

To create the MyMovies.xslt XSL transform file

  1. Open the MyMovies.xslt you created in the previous step in Visual Studio 2008.

  2. Do the following to convert the document structure of MyMovies.xslt from XML to XSLT:

    Remove the following line from the top of the document.
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

    Add the following line to the top of the document.
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    Close the style sheet element by adding the following line to the very end of the document.
    </xsl:stylesheet>

  3. Add an <xsl:template> element around the existing <w:document> element.

    <xsl:template match="/">
      <w:document ...>
        <w:body>
          ...
        </w:body>
      </w:document>
    </xsl:template>
    
  4. Save the changes to the MyMovies.xslt file.

Adding Transforms to the XSL Transform File

In the next step, you add XSL Transform elements to list the name of each genre and information for each movie within the genre in the XML data file. This transform uses the xsl:value-of and xsl:for-each elements.

  • Use the xsl:value-of elements to replace the Genre Name, Movie Title, and (year) placeholders in the Word 2007 template document.

  • Use two xsl:for-each elements to list each genre and its movies. When you place the xsl:for-each elements, make sure that you include all Open XML elements related to the genre or movie text you repeat to ensure that valid Open XML format is output by the transformation.

The following example is a fragment from the contents of the original document.xml showing the placeholder text from the template document.

...
<w:p w:rsidR="00EC137C" w:rsidRPr="00BF ...
  <w:pPr>
    <w:pStyle w:val="Heading2"/>
  </w:pPr>
  <w:r w:rsidRPr="00BF350E">
    <w:t>Genre Name</w:t>
  </w:r>
</w:p>
<w:p w:rsidR="00EC137C" w:rsidRPr="00EC1 ...
  <w:pPr>
    <w:pStyle w:val="ListParagraph"/>
    <w:numPr>
      <w:ilvl w:val="0"/>
      <w:numId w:val="1"/>
    </w:numPr>
  </w:pPr>
  <w:r w:rsidRPr="00BF350E">
    <w:rPr>
      <w:b/>
    </w:rPr>
    <w:t>Movie Title</w:t>
  </w:r>
  <w:r w:rsidR="00C46B60">
    <w:t xml:space="preserve"> (year)</w:t>
  </w:r>
</w:p>
...

The following example is an XSLT fragment of the modified MyMovies.xslt file showing the changes that were made to include the XSL Transform elements.

<!-- for-each loop added for Genre.  This loop includes the Open XML elements for the paragraph the Genre placeholder is in as well as all paragraphs for the Movies. -->
<xsl:for-each select="Movies/Genre">
  <w:p w:rsidR="00EC137C" w:rsidRPr="00BF ...
    <w:pPr>
      <w:pStyle w:val="Heading2"/>
    </w:pPr>
    <w:r w:rsidRPr="00BF350E">
      <w:t>
        <!-- Genre Name placeholder replaced by the Genre's Name attribute in the XML data file. -->
        <xsl:value-of select="@name"/>
      </w:t>
    </<xsl:value-of select w:r>
  </w:p>
  <!-- for-each loop added for Movie.  This loop includes the Open XML elements that define the   paragraph as a bulleted list. -->
  <xsl:for-each select="Movie">
    <w:p w:rsidR="00EC137C" w:rsidRPr="00EC1 ...
      <w:pPr>
        <w:pStyle w:val="ListParagraph"/>
        <w:numPr>
          <w:ilvl w:val="0"/>
          <w:numId w:val="1"/>
        </w:numPr>
      </w:pPr>
      <w:r w:rsidRPr="00BF350E">
        <w:rPr>
          <w:b/>
        </w:rPr>
        <w:t>
          <!-- Movie Title placeholder replaced by the Movie's Name element in the XML data file. -->
          <xsl:value-of select="Name"/>
        </w:t>
      </w:r>
      <w:r w:rsidR="00C46B60">
        <!-- Year placeholder replaced by the Movie's Released element in the XML data file. -->
        <w:t xml:space="preserve"> (<xsl:value-of select="Released"/>)
        </w:t>
      </w:r>
    </w:p>
  </xsl:for-each>
</xsl:for-each>
...

Be sure that you save the modified MyMovies.xslt file after you make any changes.

Configuring PHP to Enable the XSL Extension

The PHP script code sample in this Visual How To requires the PHP XSL Extension to be enabled. You can follow these steps to enable the XSL Extension.

To Configure PHP to Enable the XSL Extension

  1. Open the php.ini file that is located in your PHP installation directory in Notepad.

  2. Search for php_xsl.

  3. If needed, remove the ; comment character from the beginning of the line. You may need to modify the line to include the relative path to the php_xsl.dll file.

Creating a PHP Script Using Notepad

Start Notepad and copy the following PHP code example. Save the file to the C:\Temp directory with a name of OpenXML.php

<?php

//Declare variables for file names.
$xmlDataFile = "MyMovies.xml";
$xsltFile = "MyMovies.xslt";
$sourceTemplate = "MyMoviesTemplate.docx";
$outputDocument = "MyMovies.docx";

//Load the xml data and xslt and perform the transformation.
$xmlDocument = new DOMDocument();
$xmlDocument->load($xmlDataFile);

$xsltDocument = new DOMDocument();
$xsltDocument->load($xsltFile);

$xsltProcessor = new XSLTProcessor();
$xsltProcessor->importStylesheet($xsltDocument);

//After the transformation, $newContentNew contains 
//the XML data in the Open XML Wordprocessing format.
$newContent =  $xsltProcessor->transformToXML($xmlDocument);

//Copy the Word 2007 template document to the output file.
if (copy($sourceTemplate, $outputDocument)) {

  //Open XML files are packaged following the Open Packaging 
  //Conventions and can be treated as zip files when 
  //accessing their content.
  $zipArchive = new ZipArchive();
  $zipArchive->open($outputDocument);
  
  //Replace the content with the new content created above.
  //In the Open XML Wordprocessing format content is stored
  //in the document.xml file located in the word directory.
  $zipArchive->addFromString("word/document.xml", $newContent);
  $zipArchive->close();
  
  echo "Processing Complete";
}
?> 

The sample PHP script shown here takes the Word 2007 template document, the XSL Transform file that you created from the Word 2007 template document, and the XML data file and creates a new Word 2007 document that contains the XML data. The PHP script uses the XSL Transform file to transform the XML data into the Open XML format. Because Open XML files are packaged following the Open Packing Convention and can be treated as .zip, files the output document is then opened as a zip archive. Finally the document.xml in the Word 2007 file is replaced with a new Word 2007 document that contains the content with the XML data.

Running the PHP Script from the Windows Command Prompt

You can run the sample PHP script from the Windows Command prompt to create a new Word 2007 file that contains the XML data.

Figure 3. Running the PHP script

Running the PHP Script

After you run the sample PHP script, a new Word 2007 document named MyMovies.docx is located in the C:\Temp directory.

Figure 4. Word 2007 Output Document

Word 2007 Document Output

Read It

Starting with a Word 2007 document that already contains the layout makes it easy to create a Word 2007 document that contains XML data. Once the Word 2007 document is created, the contents can be extracted and XSLT elements added to replace values or to repeat information.

The sample PHP script shown with this Visual How To transforms XML data into the Open XML Wordprocessing format using an XSL Transform file that was created from a template Word 2007 document. A new Word 2007 document is then created that has the content based on the XML data.

The following PHP code example shows how you transform XML data into the Open XML format. The code uses a DOMDocument object to load the XSLT and create an XSLTProcessor. The XSLTPRocessor's transformToXML method is called to transform the XML data into the Open XML Wordprocessing format.

//Load the xml data and xslt and perform the transformation.
$xmlDocument = new DOMDocument();
$xmlDocument->load($xmlDataFile);

$xsltDocument = new DOMDocument();
$xsltDocument->load($xsltFile);

$xsltProcessor = new XSLTProcessor();
$xsltProcessor->importStylesheet($xsltDocument);

//After the transformation $newContentNew contains 
//the XML data in the Open XML Wordprocessing format.
$newContent =  $xsltProcessor->transformToXML($xmlDocument);

The following PHP example shows creating the output Word 2007 document. Open XML files are packaged following the Open Packaging Convention and can be treated as Zip files. A ZipArchive object is used to open the Word 2007 document for editing.

//Copy the Word 2007 template document to the output file.
if (copy($sourceTemplate, $outputDocument)) {
  
  //Open XML files are packaged following the Open Packaging 
  //Conventions and can be treated as zip files when 
  //accessing their content.
  $zipArchive = new ZipArchive();
  $zipArchive->open($outputDocument);

The contents of a Word 2007 document are stored in the document.xml file. The following PHP code example shows how to update the Word 2007 document by using the content that contains the XML data by replacing the document.xml file and then closing the ZipArchive to save the changes.

  //Replace the content with the new content created above.
  //In the Open XML Wordprocessing format content is stored
  //in the document.xml file located in the word directory.
  $zipArchive->addFromString("word/document.xml", $newContent);
  $zipArchive->close();

Explore It